Loan Prediction Using Machine Learning | Complete Guide

Build an ML Model to Predict Loan Approvals & Automate Decisions

Introduction

Loan Prediction using Machine Learning is one of the most practical and high-demand applications of AI in financial services.

Financial institutions like banks, credit unions, and fintech companies need to automate the loan approval process to reduce manual effort, minimize risks, and make faster, data-driven decisions.

In this guide, you’ll learn how to:

Understand the business objective of loan prediction
Use historical data to build a supervised ML model
Evaluate and interpret the model
Apply the model in real-world scenarios

Whether you’re a Data Science learner, Machine Learning student, or AI enthusiast—this project will enrich your portfolio!

Understanding the Project Objective

Goal: Build an ML model to predict whether a loan application will be approved or rejected, using historical data of previous applicants.

Why it matters:

Manual loan approval takes time
Risk of bias and inconsistency
Helps banks offer instant loan decisions
Reduces default risk with data-driven assessment

Business Impact:

Improve customer experience with faster approvals
Decrease operational costs
Reduce Non-Performing Loans (NPL)

Benefits of Machine Learning in Loan Prediction

📈 Higher Accuracy of credit risk assessment
⚡ Faster processing time (real-time)
🎯 Personalized loan offers
🛡️ Automated fraud detection
🤖 Scalability across thousands of applications

Dataset Overview

We’ll use this real-world dataset:

Column	Description
Loan_ID	Unique ID
loan_status	Target label (Approved/Rejected)
Principal	Loan amount
terms	Duration of loan
effective_date	Date of loan approval
due_date	Due date for repayment
paid_off_time	Actual repayment time
past_due_days	Number of overdue days
age	Applicant’s age
education	Education level
Gender	Applicant gender

Step-by-Step Code & Explanation

Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

Explanation:

pandas & numpy for data handling
matplotlib & seaborn for visualization
sklearn for preprocessing, model building, evaluation

Data Preprocessing

# Load the dataset
df = pd.read_csv('/mnt/data/Loan payments data.csv')

# Convert date columns to datetime
df['effective_date'] = pd.to_datetime(df['effective_date'])
df['due_date'] = pd.to_datetime(df['due_date'])
df['paid_off_time'] = pd.to_datetime(df['paid_off_time'], errors='coerce')

# Handle missing values
df['past_due_days'].fillna(0, inplace=True)

# Drop unnecessary columns
df.drop(['Loan_ID', 'effective_date', 'due_date', 'paid_off_time'], axis=1, inplace=True)

# Encode categorical variables
le_gender = LabelEncoder()
df['Gender'] = le_gender.fit_transform(df['Gender'])

le_education = LabelEncoder()
df['education'] = le_education.fit_transform(df['education'])

# Encode target variable
df['loan_status'] = df['loan_status'].apply(lambda x: 1 if x == 'PAIDOFF' else 0)

# Display clean data
df.head()

Explanation:

Convert dates for potential feature engineering
Fill missing values
Encode categorical variables for ML compatibility
Target label: 1 = Paid Off (Approved), 0 = Collection (Rejected)

Feature Engineering

# Feature-target split
X = df.drop('loan_status', axis=1)
y = df['loan_status']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display training data shape
print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")

Explanation:

Define features and target
Split data into training (80%) and testing (20%)
Random seed ensures reproducibility

Model Building

# Initialize RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

Explanation:

Use Random Forest, a robust ensemble classifier
Fit model on training data

Model Evaluation

# Predict on test data
y_pred = rf_model.predict(X_test)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Confusion matrix
conf_mat = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_mat, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

# Classification report
print(classification_report(y_test, y_pred))

Explanation:

Evaluate with accuracy, confusion matrix, precision-recall
Visualize model performance

🗂️ Save the Model Code Section

# Import the joblib library
import joblib

# Define the filename for the model
model_filename = 'loan_prediction_rf_model.joblib'

# Save the trained Random Forest model
joblib.dump(rf_model, model_filename)

print(f"Model saved successfully as {model_filename}")

Explanation:

joblib is highly efficient for saving scikit-learn models.
The file loan_prediction_rf_model.joblib will contain the entire trained RandomForestClassifier.
You can later load this model with:

# To load the model later:
rf_model_loaded = joblib.load('loan_prediction_rf_model.joblib')

Now, here’s a complete Streamlit app you can use to deploy this Loan Prediction model:

# loan_prediction_app.py

import streamlit as st
import pandas as pd
import joblib

# Load the trained model
model = joblib.load('loan_prediction_rf_model.joblib')

# App title
st.title("🏦 Loan Approval Prediction App")
st.write("""
This app predicts whether a loan application will be approved or not based on applicant data.
""")

# Input form
st.header("Applicant Information")

principal = st.slider("Loan Amount (Principal)", 300, 1000, step=50)
terms = st.selectbox("Loan Terms (days)", [7, 15, 30])
age = st.slider("Applicant Age", 18, 70, step=1)

education = st.selectbox("Education Level", [
    'High School or Below', 'college', 'Bechalor', 'Master or Above'
])
education_mapping = {
    'High School or Below': 0,
    'college': 1,
    'Bechalor': 2,
    'Master or Above': 3
}

gender = st.radio("Gender", ['male', 'female'])
gender_mapping = {'male': 1, 'female': 0}

past_due_days = st.slider("Past Due Days", 0, 100, step=1)

# Prepare input data
input_data = pd.DataFrame({
    'Principal': [principal],
    'terms': [terms],
    'past_due_days': [past_due_days],
    'age': [age],
    'education': [education_mapping[education]],
    'Gender': [gender_mapping[gender]]
})

# Prediction
if st.button("Predict Loan Approval"):
    prediction = model.predict(input_data)[0]
    if prediction == 1:
        st.success("✅ The loan is likely to be APPROVED!")
    else:
        st.error("❌ The loan is likely to be REJECTED.")

How to Run Streamlit App

# Install streamlit if not installed
pip install streamlit

# Run the app
streamlit run loan_prediction_app.py

Conclusion

👉 We’ve successfully built a Loan Prediction Machine Learning Model using Python and RandomForest.

Key takeaways:

ML helps financial institutions automate loan approvals.
Using features like Principal, terms, age, education, and gender, the model achieves reasonable predictive power.
You can further improve it using:
- Advanced models (XGBoost, LightGBM)
- Hyperparameter tuning
- Feature engineering

Real-world deployment:

Integrate this model into banking software
Enable real-time decision-making at scale

🚀 Want to take your Data Science skills to the next level?
👉 Practice more ML projects on real-world datasets.
👉 Follow this tutorial to create a complete portfolio project.
👉 Connect with us on LinkedIn for more advanced tutorials!

🔍 More Machine Learning Project Ideas to Sharpen Your Skills

Looking to expand your machine learning portfolio? Here are some impactful project ideas that cover a wide range of real-world applications—perfect for data science learners and AI enthusiasts:

⚖️ Personal Injury Case Outcome Prediction

Build a classification model that predicts the outcome of personal injury legal cases using historical court data and legal documents.

🚢 Titanic Dataset – Exploratory Data Analysis & Prediction

Perform in-depth exploratory data analysis (EDA) on the Titanic dataset, uncover hidden patterns, and develop predictive models to forecast survival outcomes.

🛒 Big Mart Sales Prediction Project (2025 Edition)

Use regression techniques to forecast sales across various Big Mart outlets by analyzing product features, store types, and seasonal demand.

🥔 Potato Leaf Disease Detection Using Deep Learning

Apply computer vision techniques to classify diseases in potato leaves. Utilize CNN-based models for accurate plant health diagnostics.

✋ Hand Gesture Recognition with Deep Learning

Design a deep learning system that recognizes hand gestures in real-time using Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

🚗 Car Accident Attorney Case Viability Prediction

Predict the profitability or success potential of automobile accident cases for attorneys using legal data and case history.

💳 Credit Card Fraud Detection System

Detect fraudulent transactions using machine learning and anomaly detection techniques. Focus on precision and real-time prediction to reduce financial risk.

🛡️ Insurance Claim Severity Modeling

Forecast the severity of insurance claims by analyzing policyholder profiles, claim types, and incident data using advanced regression or XGBoost models.

BiStartX