🔍 Introduction
Exploratory Data Analysis (EDA) is the foundation of any successful data science project. It involves understanding the dataset, identifying patterns, detecting anomalies, checking assumptions, and summarizing the main characteristics. In this project, we perform EDA on the Titanic dataset, one of the most popular datasets in the data science community.
In this article, you’ll learn how to:
- Load and clean the dataset
- Handle missing values
- Explore distributions and relationships
- Visualize insights using
matplotlib
,seaborn
, andplotly
- Deploy your EDA project using Streamlit
📂 Dataset Reference
We use the Titanic dataset from Kaggle Titanic Dataset – it contains information about passengers on the Titanic and whether they survived or not.
🧰 Tools and Libraries Used
- Python (Pandas, NumPy, Seaborn, Matplotlib, Plotly)
- Streamlit (for deployment)
- Jupyter Notebook (for development)
🧪 Step-by-Step Exploratory Data Analysis (EDA)
Import Required Libraries
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt import plotly.express as px import streamlit as st
Load the Dataset
@st.cache_data def load_data(): return pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv') df = load_data() st.write("### Titanic Dataset", df.head())
Data Overview
st.subheader("Dataset Information") st.write(df.info()) st.write("Shape of dataset:", df.shape) st.subheader("Missing Values") st.write(df.isnull().sum())
Handle Missing Values
df['Age'].fillna(df['Age'].median(), inplace=True) df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True) df.drop('Cabin', axis=1, inplace=True)
Univariate Analysis
st.subheader("Survival Count") st.bar_chart(df['Survived'].value_counts()) st.subheader("Passenger Class Distribution") st.bar_chart(df['Pclass'].value_counts())
Bivariate Analysis
st.subheader("Survival Rate by Gender") fig = px.histogram(df, x='Sex', color='Survived', barmode='group') st.plotly_chart(fig) st.subheader("Survival Rate by Age") fig = px.histogram(df, x='Age', color='Survived', nbins=30) st.plotly_chart(fig) st.subheader("Correlation Matrix") corr = df.corr(numeric_only=True) fig, ax = plt.subplots() sns.heatmap(corr, annot=True, cmap='coolwarm', ax=ax) st.pyplot(fig)
Feature Engineering (Optional)
df['FamilySize'] = df['SibSp'] + df['Parch'] + 1 st.subheader("Survival Rate by Family Size") fig = px.histogram(df, x='FamilySize', color='Survived') st.plotly_chart(fig)
🚀 Streamlit Deployment
To deploy this EDA project with Streamlit, follow these simple steps:
🛠️ 1. Save Your Script
Save your Python code in a file called titanic_eda_app.py
.
🌐 2. Install Streamlit (if not installed)
pip install streamlit
🚀 3. Run the Streamlit App
streamlit run titanic_eda_app.py
🌍 4. Deploy Online (Optional)
You can deploy your app using Streamlit Community Cloud:
- Push your code to GitHub
- Visit streamlit.io/cloud
- Connect your GitHub repo and deploy in seconds
🔚 Conclusion
The Titanic Dataset Exploratory Data Analysis project is a classic yet powerful example of how data exploration can uncover meaningful insights. Through this analysis, we examined the relationships between various passenger features such as age, gender, class, fare, and family size, and their likelihood of survival. We handled missing values, visualized key distributions, and identified correlations that helped paint a clearer picture of the survival patterns during the Titanic disaster.
This project allowed us to practice essential data analysis skills such as data cleaning, univariate and bivariate analysis, and visualization using libraries like Seaborn, Matplotlib, and Plotly. We also implemented feature engineering (e.g., family size) to enhance understanding of passenger dynamics. By deploying the project using Streamlit, we turned a static analysis into an interactive web app, making the insights easily accessible and visually engaging.
The Titanic Dataset Exploratory Data Analysis serves as a foundational project for any data analyst or data science enthusiast. It’s highly recommended for building your portfolio and demonstrating your ability to work with real-world data, perform detailed analysis, and deliver actionable insights through visual storytelling and app deployment.
This end-to-end project demonstrates skills in:
- Data preprocessing
- Visualization
- Insight discovery
- Python and Streamlit app deployment