Build an ML Model to Predict Loan Approvals & Automate Decisions
Introduction
Loan Prediction using Machine Learning is one of the most practical and high-demand applications of AI in financial services.
Financial institutions like banks, credit unions, and fintech companies need to automate the loan approval process to reduce manual effort, minimize risks, and make faster, data-driven decisions.
In this guide, you’ll learn how to:
- Understand the business objective of loan prediction
- Use historical data to build a supervised ML model
- Evaluate and interpret the model
- Apply the model in real-world scenarios
Whether you’re a Data Science learner, Machine Learning student, or AI enthusiast—this project will enrich your portfolio!
Understanding the Project Objective
Goal: Build an ML model to predict whether a loan application will be approved or rejected, using historical data of previous applicants.
Why it matters:
- Manual loan approval takes time
- Risk of bias and inconsistency
- Helps banks offer instant loan decisions
- Reduces default risk with data-driven assessment
Business Impact:
- Improve customer experience with faster approvals
- Decrease operational costs
- Reduce Non-Performing Loans (NPL)
Benefits of Machine Learning in Loan Prediction
📈 Higher Accuracy of credit risk assessment
⚡ Faster processing time (real-time)
🎯 Personalized loan offers
🛡️ Automated fraud detection
🤖 Scalability across thousands of applications
Dataset Overview
We’ll use this real-world dataset:
Column | Description |
---|---|
Loan_ID | Unique ID |
loan_status | Target label (Approved/Rejected) |
Principal | Loan amount |
terms | Duration of loan |
effective_date | Date of loan approval |
due_date | Due date for repayment |
paid_off_time | Actual repayment time |
past_due_days | Number of overdue days |
age | Applicant’s age |
education | Education level |
Gender | Applicant gender |
Step-by-Step Code & Explanation
Importing Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
Explanation:
pandas
&numpy
for data handlingmatplotlib
&seaborn
for visualizationsklearn
for preprocessing, model building, evaluation
Data Preprocessing
# Load the dataset df = pd.read_csv('/mnt/data/Loan payments data.csv') # Convert date columns to datetime df['effective_date'] = pd.to_datetime(df['effective_date']) df['due_date'] = pd.to_datetime(df['due_date']) df['paid_off_time'] = pd.to_datetime(df['paid_off_time'], errors='coerce') # Handle missing values df['past_due_days'].fillna(0, inplace=True) # Drop unnecessary columns df.drop(['Loan_ID', 'effective_date', 'due_date', 'paid_off_time'], axis=1, inplace=True) # Encode categorical variables le_gender = LabelEncoder() df['Gender'] = le_gender.fit_transform(df['Gender']) le_education = LabelEncoder() df['education'] = le_education.fit_transform(df['education']) # Encode target variable df['loan_status'] = df['loan_status'].apply(lambda x: 1 if x == 'PAIDOFF' else 0) # Display clean data df.head()
Explanation:
- Convert dates for potential feature engineering
- Fill missing values
- Encode categorical variables for ML compatibility
- Target label: 1 = Paid Off (Approved), 0 = Collection (Rejected)
Feature Engineering
# Feature-target split X = df.drop('loan_status', axis=1) y = df['loan_status'] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Display training data shape print(f"Training data shape: {X_train.shape}") print(f"Test data shape: {X_test.shape}")
Explanation:
- Define features and target
- Split data into training (80%) and testing (20%)
- Random seed ensures reproducibility
Model Building
# Initialize RandomForestClassifier rf_model = RandomForestClassifier(n_estimators=100, random_state=42) # Train the model rf_model.fit(X_train, y_train)
Explanation:
- Use Random Forest, a robust ensemble classifier
- Fit model on training data
Model Evaluation
# Predict on test data y_pred = rf_model.predict(X_test) # Accuracy score accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}") # Confusion matrix conf_mat = confusion_matrix(y_test, y_pred) sns.heatmap(conf_mat, annot=True, fmt='d', cmap='Blues') plt.title('Confusion Matrix') plt.xlabel('Predicted') plt.ylabel('Actual') plt.show() # Classification report print(classification_report(y_test, y_pred))
Explanation:
- Evaluate with accuracy, confusion matrix, precision-recall
- Visualize model performance
🗂️ Save the Model Code Section
# Import the joblib library import joblib # Define the filename for the model model_filename = 'loan_prediction_rf_model.joblib' # Save the trained Random Forest model joblib.dump(rf_model, model_filename) print(f"Model saved successfully as {model_filename}")
Explanation:
joblib
is highly efficient for saving scikit-learn models.- The file
loan_prediction_rf_model.joblib
will contain the entire trained RandomForestClassifier. - You can later load this model with:
# To load the model later: rf_model_loaded = joblib.load('loan_prediction_rf_model.joblib')
Now, here’s a complete Streamlit app you can use to deploy this Loan Prediction model:
# loan_prediction_app.py import streamlit as st import pandas as pd import joblib # Load the trained model model = joblib.load('loan_prediction_rf_model.joblib') # App title st.title("🏦 Loan Approval Prediction App") st.write(""" This app predicts whether a loan application will be approved or not based on applicant data. """) # Input form st.header("Applicant Information") principal = st.slider("Loan Amount (Principal)", 300, 1000, step=50) terms = st.selectbox("Loan Terms (days)", [7, 15, 30]) age = st.slider("Applicant Age", 18, 70, step=1) education = st.selectbox("Education Level", [ 'High School or Below', 'college', 'Bechalor', 'Master or Above' ]) education_mapping = { 'High School or Below': 0, 'college': 1, 'Bechalor': 2, 'Master or Above': 3 } gender = st.radio("Gender", ['male', 'female']) gender_mapping = {'male': 1, 'female': 0} past_due_days = st.slider("Past Due Days", 0, 100, step=1) # Prepare input data input_data = pd.DataFrame({ 'Principal': [principal], 'terms': [terms], 'past_due_days': [past_due_days], 'age': [age], 'education': [education_mapping[education]], 'Gender': [gender_mapping[gender]] }) # Prediction if st.button("Predict Loan Approval"): prediction = model.predict(input_data)[0] if prediction == 1: st.success("✅ The loan is likely to be APPROVED!") else: st.error("❌ The loan is likely to be REJECTED.")
How to Run Streamlit App
# Install streamlit if not installed pip install streamlit # Run the app streamlit run loan_prediction_app.py
Conclusion
👉 We’ve successfully built a Loan Prediction Machine Learning Model using Python and RandomForest.
Key takeaways:
- ML helps financial institutions automate loan approvals.
- Using features like Principal, terms, age, education, and gender, the model achieves reasonable predictive power.
- You can further improve it using:
- Advanced models (XGBoost, LightGBM)
- Hyperparameter tuning
- Feature engineering
Real-world deployment:
- Integrate this model into banking software
- Enable real-time decision-making at scale
🚀 Want to take your Data Science skills to the next level?
👉 Practice more ML projects on real-world datasets.
👉 Follow this tutorial to create a complete portfolio project.
👉 Connect with us on LinkedIn for more advanced tutorials!
🔍 More Machine Learning Project Ideas to Sharpen Your Skills
Looking to expand your machine learning portfolio? Here are some impactful project ideas that cover a wide range of real-world applications—perfect for data science learners and AI enthusiasts:
⚖️ Personal Injury Case Outcome Prediction
Build a classification model that predicts the outcome of personal injury legal cases using historical court data and legal documents.
🚢 Titanic Dataset – Exploratory Data Analysis & Prediction
Perform in-depth exploratory data analysis (EDA) on the Titanic dataset, uncover hidden patterns, and develop predictive models to forecast survival outcomes.
🛒 Big Mart Sales Prediction Project (2025 Edition)
Use regression techniques to forecast sales across various Big Mart outlets by analyzing product features, store types, and seasonal demand.
🥔 Potato Leaf Disease Detection Using Deep Learning
Apply computer vision techniques to classify diseases in potato leaves. Utilize CNN-based models for accurate plant health diagnostics.
✋ Hand Gesture Recognition with Deep Learning
Design a deep learning system that recognizes hand gestures in real-time using Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
🚗 Car Accident Attorney Case Viability Prediction
Predict the profitability or success potential of automobile accident cases for attorneys using legal data and case history.
💳 Credit Card Fraud Detection System
Detect fraudulent transactions using machine learning and anomaly detection techniques. Focus on precision and real-time prediction to reduce financial risk.
🛡️ Insurance Claim Severity Modeling
Forecast the severity of insurance claims by analyzing policyholder profiles, claim types, and incident data using advanced regression or XGBoost models.