Loan Prediction Using Machine Learning | Complete Guide

A person analyzing financial graphs on a tablet and laptop, used for loan prediction with machine learning.

Build an ML Model to Predict Loan Approvals & Automate Decisions

Introduction

Loan Prediction using Machine Learning is one of the most practical and high-demand applications of AI in financial services.

Financial institutions like banks, credit unions, and fintech companies need to automate the loan approval process to reduce manual effort, minimize risks, and make faster, data-driven decisions.

In this guide, you’ll learn how to:

  • Understand the business objective of loan prediction
  • Use historical data to build a supervised ML model
  • Evaluate and interpret the model
  • Apply the model in real-world scenarios

Whether you’re a Data Science learner, Machine Learning student, or AI enthusiast—this project will enrich your portfolio!

Understanding the Project Objective

Goal: Build an ML model to predict whether a loan application will be approved or rejected, using historical data of previous applicants.

Why it matters:

  • Manual loan approval takes time
  • Risk of bias and inconsistency
  • Helps banks offer instant loan decisions
  • Reduces default risk with data-driven assessment

Business Impact:

  • Improve customer experience with faster approvals
  • Decrease operational costs
  • Reduce Non-Performing Loans (NPL)

Benefits of Machine Learning in Loan Prediction

📈 Higher Accuracy of credit risk assessment
⚡ Faster processing time (real-time)
🎯 Personalized loan offers
🛡️ Automated fraud detection
🤖 Scalability across thousands of applications

Dataset Overview

We’ll use this real-world dataset:

ColumnDescription
Loan_IDUnique ID
loan_statusTarget label (Approved/Rejected)
PrincipalLoan amount
termsDuration of loan
effective_dateDate of loan approval
due_dateDue date for repayment
paid_off_timeActual repayment time
past_due_daysNumber of overdue days
ageApplicant’s age
educationEducation level
GenderApplicant gender

Step-by-Step Code & Explanation

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

Explanation:

  • pandas & numpy for data handling
  • matplotlib & seaborn for visualization
  • sklearn for preprocessing, model building, evaluation
# Load the dataset
df = pd.read_csv('/mnt/data/Loan payments data.csv')

# Convert date columns to datetime
df['effective_date'] = pd.to_datetime(df['effective_date'])
df['due_date'] = pd.to_datetime(df['due_date'])
df['paid_off_time'] = pd.to_datetime(df['paid_off_time'], errors='coerce')

# Handle missing values
df['past_due_days'].fillna(0, inplace=True)

# Drop unnecessary columns
df.drop(['Loan_ID', 'effective_date', 'due_date', 'paid_off_time'], axis=1, inplace=True)

# Encode categorical variables
le_gender = LabelEncoder()
df['Gender'] = le_gender.fit_transform(df['Gender'])

le_education = LabelEncoder()
df['education'] = le_education.fit_transform(df['education'])

# Encode target variable
df['loan_status'] = df['loan_status'].apply(lambda x: 1 if x == 'PAIDOFF' else 0)

# Display clean data
df.head()

Explanation:

  • Convert dates for potential feature engineering
  • Fill missing values
  • Encode categorical variables for ML compatibility
  • Target label: 1 = Paid Off (Approved), 0 = Collection (Rejected)
# Feature-target split
X = df.drop('loan_status', axis=1)
y = df['loan_status']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display training data shape
print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")

Explanation:

  • Define features and target
  • Split data into training (80%) and testing (20%)
  • Random seed ensures reproducibility
# Initialize RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

Explanation:

  • Use Random Forest, a robust ensemble classifier
  • Fit model on training data
# Predict on test data
y_pred = rf_model.predict(X_test)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Confusion matrix
conf_mat = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_mat, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

# Classification report
print(classification_report(y_test, y_pred))

Explanation:

  • Evaluate with accuracy, confusion matrix, precision-recall
  • Visualize model performance
# Import the joblib library
import joblib

# Define the filename for the model
model_filename = 'loan_prediction_rf_model.joblib'

# Save the trained Random Forest model
joblib.dump(rf_model, model_filename)

print(f"Model saved successfully as {model_filename}")

Explanation:

  • joblib is highly efficient for saving scikit-learn models.
  • The file loan_prediction_rf_model.joblib will contain the entire trained RandomForestClassifier.
  • You can later load this model with:
# To load the model later:
rf_model_loaded = joblib.load('loan_prediction_rf_model.joblib')
# loan_prediction_app.py

import streamlit as st
import pandas as pd
import joblib

# Load the trained model
model = joblib.load('loan_prediction_rf_model.joblib')

# App title
st.title("🏦 Loan Approval Prediction App")
st.write("""
This app predicts whether a loan application will be approved or not based on applicant data.
""")

# Input form
st.header("Applicant Information")

principal = st.slider("Loan Amount (Principal)", 300, 1000, step=50)
terms = st.selectbox("Loan Terms (days)", [7, 15, 30])
age = st.slider("Applicant Age", 18, 70, step=1)

education = st.selectbox("Education Level", [
    'High School or Below', 'college', 'Bechalor', 'Master or Above'
])
education_mapping = {
    'High School or Below': 0,
    'college': 1,
    'Bechalor': 2,
    'Master or Above': 3
}

gender = st.radio("Gender", ['male', 'female'])
gender_mapping = {'male': 1, 'female': 0}

past_due_days = st.slider("Past Due Days", 0, 100, step=1)

# Prepare input data
input_data = pd.DataFrame({
    'Principal': [principal],
    'terms': [terms],
    'past_due_days': [past_due_days],
    'age': [age],
    'education': [education_mapping[education]],
    'Gender': [gender_mapping[gender]]
})

# Prediction
if st.button("Predict Loan Approval"):
    prediction = model.predict(input_data)[0]
    if prediction == 1:
        st.success("✅ The loan is likely to be APPROVED!")
    else:
        st.error("❌ The loan is likely to be REJECTED.")
# Install streamlit if not installed
pip install streamlit

# Run the app
streamlit run loan_prediction_app.py

Conclusion

👉 We’ve successfully built a Loan Prediction Machine Learning Model using Python and RandomForest.

Key takeaways:

  • ML helps financial institutions automate loan approvals.
  • Using features like Principal, terms, age, education, and gender, the model achieves reasonable predictive power.
  • You can further improve it using:
    • Advanced models (XGBoost, LightGBM)
    • Hyperparameter tuning
    • Feature engineering

Real-world deployment:

  • Integrate this model into banking software
  • Enable real-time decision-making at scale

🚀 Want to take your Data Science skills to the next level?
👉 Practice more ML projects on real-world datasets.
👉 Follow this tutorial to create a complete portfolio project.
👉 Connect with us on LinkedIn for more advanced tutorials!

🔍 More Machine Learning Project Ideas to Sharpen Your Skills

Looking to expand your machine learning portfolio? Here are some impactful project ideas that cover a wide range of real-world applications—perfect for data science learners and AI enthusiasts:

⚖️ Personal Injury Case Outcome Prediction

Build a classification model that predicts the outcome of personal injury legal cases using historical court data and legal documents.

🚢 Titanic Dataset – Exploratory Data Analysis & Prediction

Perform in-depth exploratory data analysis (EDA) on the Titanic dataset, uncover hidden patterns, and develop predictive models to forecast survival outcomes.

🛒 Big Mart Sales Prediction Project (2025 Edition)

Use regression techniques to forecast sales across various Big Mart outlets by analyzing product features, store types, and seasonal demand.

🥔 Potato Leaf Disease Detection Using Deep Learning

Apply computer vision techniques to classify diseases in potato leaves. Utilize CNN-based models for accurate plant health diagnostics.

Hand Gesture Recognition with Deep Learning

Design a deep learning system that recognizes hand gestures in real-time using Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

🚗 Car Accident Attorney Case Viability Prediction

Predict the profitability or success potential of automobile accident cases for attorneys using legal data and case history.

💳 Credit Card Fraud Detection System

Detect fraudulent transactions using machine learning and anomaly detection techniques. Focus on precision and real-time prediction to reduce financial risk.

🛡️ Insurance Claim Severity Modeling

Forecast the severity of insurance claims by analyzing policyholder profiles, claim types, and incident data using advanced regression or XGBoost models.

Leave a Reply

Your email address will not be published. Required fields are marked *