Exploratory Data Analysis on the Titanic Dataset

🔍 Introduction

Exploratory Data Analysis (EDA) is the foundation of any successful data science project. It involves understanding the dataset, identifying patterns, detecting anomalies, checking assumptions, and summarizing the main characteristics. In this project, we perform EDA on the Titanic dataset, one of the most popular datasets in the data science community.

In this article, you’ll learn how to:

Load and clean the dataset
Handle missing values
Explore distributions and relationships
Visualize insights using matplotlib, seaborn, and plotly
Deploy your EDA project using Streamlit

📂 Dataset Reference

We use the Titanic dataset from Kaggle Titanic Dataset – it contains information about passengers on the Titanic and whether they survived or not.

🧰 Tools and Libraries Used

Python (Pandas, NumPy, Seaborn, Matplotlib, Plotly)
Streamlit (for deployment)
Jupyter Notebook (for development)

🧪 Step-by-Step Exploratory Data Analysis (EDA)

Import Required Libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import streamlit as st

Load the Dataset

@st.cache_data
def load_data():
    return pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

df = load_data()
st.write("### Titanic Dataset", df.head())

Data Overview

st.subheader("Dataset Information")
st.write(df.info())
st.write("Shape of dataset:", df.shape)

st.subheader("Missing Values")
st.write(df.isnull().sum())

Handle Missing Values

df['Age'].fillna(df['Age'].median(), inplace=True)
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
df.drop('Cabin', axis=1, inplace=True)

Univariate Analysis

st.subheader("Survival Count")
st.bar_chart(df['Survived'].value_counts())

st.subheader("Passenger Class Distribution")
st.bar_chart(df['Pclass'].value_counts())

Bivariate Analysis

st.subheader("Survival Rate by Gender")
fig = px.histogram(df, x='Sex', color='Survived', barmode='group')
st.plotly_chart(fig)

st.subheader("Survival Rate by Age")
fig = px.histogram(df, x='Age', color='Survived', nbins=30)
st.plotly_chart(fig)

st.subheader("Correlation Matrix")
corr = df.corr(numeric_only=True)
fig, ax = plt.subplots()
sns.heatmap(corr, annot=True, cmap='coolwarm', ax=ax)
st.pyplot(fig)

Feature Engineering (Optional)

df['FamilySize'] = df['SibSp'] + df['Parch'] + 1
st.subheader("Survival Rate by Family Size")
fig = px.histogram(df, x='FamilySize', color='Survived')
st.plotly_chart(fig)

🚀 Streamlit Deployment

To deploy this EDA project with Streamlit, follow these simple steps:

🛠️ 1. Save Your Script

Save your Python code in a file called titanic_eda_app.py.

🌐 2. Install Streamlit (if not installed)

pip install streamlit

🚀 3. Run the Streamlit App

streamlit run titanic_eda_app.py

🌍 4. Deploy Online (Optional)

You can deploy your app using Streamlit Community Cloud:

Push your code to GitHub
Visit streamlit.io/cloud
Connect your GitHub repo and deploy in seconds

🔚 Conclusion

The Titanic Dataset Exploratory Data Analysis project is a classic yet powerful example of how data exploration can uncover meaningful insights. Through this analysis, we examined the relationships between various passenger features such as age, gender, class, fare, and family size, and their likelihood of survival. We handled missing values, visualized key distributions, and identified correlations that helped paint a clearer picture of the survival patterns during the Titanic disaster.

This project allowed us to practice essential data analysis skills such as data cleaning, univariate and bivariate analysis, and visualization using libraries like Seaborn, Matplotlib, and Plotly. We also implemented feature engineering (e.g., family size) to enhance understanding of passenger dynamics. By deploying the project using Streamlit, we turned a static analysis into an interactive web app, making the insights easily accessible and visually engaging.

The Titanic Dataset Exploratory Data Analysis serves as a foundational project for any data analyst or data science enthusiast. It’s highly recommended for building your portfolio and demonstrating your ability to work with real-world data, perform detailed analysis, and deliver actionable insights through visual storytelling and app deployment.

This end-to-end project demonstrates skills in:

Data preprocessing
Visualization
Insight discovery
Python and Streamlit app deployment

Beginner Data Science Project, Data Analysis Project, Data Science Portfolio Project, EDA with Python, Exploratory Data Analysis, Streamlit Deployment, Titanic Dataset, Titanic EDA Tutorial, Titanic Survival Analysis

BiStartX