Sentiment Analysis on Social Media Posts Using NLP

1. Introduction

Social media platforms are buzzing with opinions and emotions. From product reviews to political debates, user-generated content provides a goldmine of insights. But how can businesses and analysts extract these insights effectively? That’s where sentiment analysis comes into play.

In this project, we’ll build an end-to-end pipeline for Sentiment Analysis on Social Media Posts using Natural Language Processing (NLP) techniques. We’ll also create a real-time web app using Streamlit to classify user input into Positive, Negative, or Neutral sentiment.

Ideal for aspiring data scientists, ML students, and AI learners, this guide will help you learn practical NLP, data visualization, and deployment skills.

2. What is Sentiment Analysis?

Sentiment analysis, or opinion mining, is a sub-field of NLP that interprets and classifies emotions within text data. It can detect if the expressed sentiment is positive, negative, or neutral.

Common Use Cases:

Customer feedback analysis
Brand reputation management
Political opinion tracking
Market research

Social media has become a dominant communication channel for individuals and organizations. Sentiment analysis of posts, tweets, and reviews allows:

Businesses to understand customer perception
Politicians to gauge public sentiment
Data scientists to identify trends

Fun Fact: Over 500 million tweets are sent every day — imagine the insights hidden in them!

4. Project Overview

In this end-to-end machine learning project, you will:

Clean and preprocess the Twitter dataset
Perform EDA and visualize sentiment trends
Extract features using NLP methods like TF-IDF
Train a classification model (Logistic Regression)
Build a web app using Streamlit for real-time sentiment prediction

5. Dataset Description

We use the Tweets.csv dataset, which contains over 3,000 tweets labeled as Positive, Negative, or Neutral.

Key Features:

tweet_id: Unique ID for the tweet
airline_sentiment: Target variable (Positive, Neutral, Negative)
text: Actual tweet content

You can download the dataset here or use the one uploaded in this project.

Sentiment Analysis Workflow Diagram

6. Data Preprocessing

Before feeding the text to a model, it must be cleaned.

Code Snippet:

import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

# Load the dataset
df = pd.read_csv("Tweets.csv")

# Remove null values
df = df[['text', 'airline_sentiment']].dropna()

# Clean text
def clean_text(text):
    text = re.sub(r"http\S+", "", text)  # remove links
    text = re.sub(r"@\w+", "", text)      # remove mentions
    text = re.sub(r"[^A-Za-z\s]", "", text)  # remove special chars
    text = text.lower()
    return text

df['clean_text'] = df['text'].apply(clean_text)

Explanation:

Removes links, mentions, special characters
Converts text to lowercase
Leaves us with clean, model-ready text

7. Exploratory Data Analysis (EDA)

Visualize sentiment distribution:

import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(data=df, x='airline_sentiment', palette='cool')
plt.title("Sentiment Distribution")
plt.show()

You can also generate word clouds for each sentiment class to understand common phrases.

8. Feature Extraction with NLP

We’ll convert the cleaned text into numeric features using TF-IDF.

Code Snippet:

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(df['clean_text'])
y = df['airline_sentiment']

Why TF-IDF?

It scores words based on how important they are to a document relative to the corpus. It’s efficient and works well with text classification.

9. Model Building & Evaluation

We use Logistic Regression for classification.

Code Snippet:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

predictions = model.predict(X_test)
print(classification_report(y_test, predictions))

Metrics to Track:

Precision
Recall
F1 Score

Additionally, you can test out Random Forest, SVM, or Naive Bayes.

10. Real-Time Prediction Web App with Streamlit

Let’s use Streamlit to implement this model and make real-time predictions.

Code Snippet (app.py):

import streamlit as st

st.title("Social Media Sentiment Analysis")
user_input = st.text_area("Enter a tweet:")

if st.button("Predict Sentiment"):
    cleaned = clean_text(user_input)
    vector = vectorizer.transform([cleaned])
    result = model.predict(vector)
    st.write(f"Predicted Sentiment: **{result[0]}**")

How to Run:

streamlit run app.py

Features:

Live text input
Instant prediction
Interactive UI

Conclusion

We built a full sentiment analysis pipeline from data cleaning to real-time deployment. This project strengthens your skills in:

Text preprocessing and NLP
Model training and evaluation
Real-time application deployment using Streamlit

Whether you’re a data science learner, ML student, or AI enthusiast, this project offers a practical and rewarding experience.

Are you prepared to apply your knowledge of data science to practical problems? 👉 Join the BiStartX Internship Program and build impactful machine learning projects with mentorship and certification.

Follow BiStartX on LinkedIn for the latest in ML projects, internships, and real-world AI applications.

More Machine Learning Project Ideas

Looking for more hands-on machine learning experience? These real-world project ideas can help solidify your skills and make your portfolio stand out:

🔐 Credit Card Fraud Detection Using ML: A Complete Guide
Develop an anomaly detection system to flag fraudulent credit card transactions using classification techniques and imbalanced data handling.

🏡 House Price Prediction Using Machine Learning
Use regression algorithms to predict housing prices from features like location, square footage, and number of rooms.

🪙 Gold Price Forecasting with Machine Learning
Forecast gold prices based on economic indicators and historical financial trends.

🚗 Car Accident Case Outcome Prediction
Analyze legal data to predict compensation outcomes from vehicle accident lawsuits.

💸 Loan Approval Prediction System
Classify loan applications as approved or rejected using applicant attributes like credit score, income, and employment history.

🏥 Health Insurance Cross-Selling Model
Identify potential buyers for vehicle insurance policies from existing health insurance customers.

⚖️ Personal Injury Case Outcome Predictor
Predict the outcome of personal injury cases settlement, dismissal, or trial using structured legal data.

🚢 Titanic Survival Classification Model
Work on a beginner-friendly classification problem using historical Titanic passenger data.

🛒 Big Mart Sales Prediction (2025 Edition)
Build a regression model to forecast retail sales using data on product categories, store demographics, and promotions.

BiStartX