Introduction
Gold, often viewed as a safe-haven asset, holds immense value in both investment and economic stability. Forecasting its price can unlock actionable insights for investors, traders, and policymakers.
In this machine learning project, we aim to predict gold prices using historical financial data and deploy the model using Streamlit, a popular Python web framework. The project includes data preprocessing, exploratory data analysis (EDA), model building, evaluation, and web deployment.
Why Predict Gold Prices?
Several macroeconomic factors affect gold prices:
- Inflation
- Currency strength (e.g., USD Index)
- Oil and energy prices
- Stock indices (S&P 500, Dow Jones)
- Global market sentiment
Accurate prediction helps:
- Make informed investment decisions
- Design hedging strategies
- Forecast economic trends
Project Overview
- Goal: Predict gold prices using ML models based on financial and commodity market indicators.
- Steps:
- Collect and clean data
- Perform EDA
- Feature engineering
- Train ML models
- Evaluate performance
- Deploy using Streamlit
Dataset Description
The dataset (FINAL_USO.csv) contains 1,718 records with 81 columns, including:
- Gold prices (
Open,High,Low,Close,Adj Close,Volume) - Market indicators: S&P 500, Dow Jones, Euro, Oil, Silver, Platinum, Palladium
- Trends & Prices: Daily trends for other financial instruments (USD Index, ETFs like GDX, USO)
π Dataset Name: FINAL_USO.csv
π Download Link: Download Gold Price Prediction Dataset (CSV)
Each row represents daily market data, making it ideal for time-series analysis.
Data Preprocessing
Steps:
- Convert
Datetodatetimeformat - Handle missing values (if any)
- Normalize or scale numeric features
- Split into features (X) and target (y =
Closeprice of gold)
# Convert Date and set index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Define target and features
y = df['Close']
X = df.drop(columns=['Close', 'Adj Close'])
Exploratory Data Analysis (EDA)
EDA helps uncover relationships between gold and other financial indicators.
- Correlation heatmap to find influential variables
- Line plots for trends (Gold vs Oil, Gold vs USD Index)
- Lag analysis to check temporal dependencies
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(14,8))
sns.heatmap(df.corr(numeric_only=True)[['Close']].sort_values('Close', ascending=False), annot=True, cmap='coolwarm')
plt.title('Feature Correlation with Gold Close Price')
plt.show()
Feature Engineering
Enhance predictive power with:
- Rolling means (5-day, 10-day)
- Price differentials (e.g.,
High - Low) - Trend indicators
- Lagged features for time-series forecasting
df['Gold_Rolling_Mean_5'] = df['Close'].rolling(window=5).mean() df['Gold_Diff'] = df['High'] - df['Low']
Model Building
Models Used:
- Linear Regression
- Random Forest
- XGBoost Regressor
Steps:
- Split data into train/test
- Train model
- Evaluate using MAE, RMSE, RΒ²
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("RMSE:", mean_squared_error(y_test, predictions, squared=False))
print("RΒ² Score:", r2_score(y_test, predictions))
Streamlit Deployment
Streamlit’s user-friendly UI makes deployment straightforward.
Streamlit App Code
import streamlit as st
import numpy as np
st.title("Gold Price Prediction App")
# User input
input_data = st.number_input("Enter S&P Open", min_value=0.0)
pred = model.predict([[input_data] + [0]*(X.shape[1]-1)]) # Dummy example
st.write(f"Predicted Gold Price: ${pred[0]:.2f}")
Run Command:
streamlit run app.py
Code Walkthrough Summary
| Section | Description |
|---|---|
| Data Preprocessing | Cleaned and transformed raw data for analysis |
| EDA | Identified strong correlates and visualized trends |
| Feature Engineering | Created rolling means, lags, and price differentials |
| Model Building | Used ML models to train and predict gold prices |
| Deployment | Built a Streamlit app for user-friendly interaction |
Conclusion
Gold price prediction using machine learning offers real-time insights into market dynamics. By leveraging financial indicators, time-series techniques, and modern ML algorithms, we can forecast trends and empower better decisions.
Deploying it with Streamlit makes it even more powerful by bringing interactivity to data science.
What machine learning algorithm is best for gold prediction?
Random Forest and XGBoost often outperform due to their robustness and ability to handle non-linear data.
Can I use this project in my data science portfolio?
Absolutely! It demonstrates EDA, feature engineering, modeling, and deployment skills.
Do I need deep learning for this project?
Not necessarily. Tree-based models work well for structured financial data.
Can I predict gold prices in real-time?
Yes, with real-time API integration and model retraining, it’s possible.
How do I host this Streamlit app online?
Use platforms like Streamlit Cloud, Heroku, or Render.
π Ready to dive into data-driven finance? Start your Gold Price Prediction Project today and level up your data science and machine learning skills.
π Follow BiStartX LinkedIn Page
π For internships and more projects.
More Machine Learning Project Ideas to Improve Your Capabilities
If you’re eager to grow your machine learning portfolio and dive into hands-on applications, hereβs a list of diverse and practical project ideas. These projects are especially valuable for data science learners, machine learning students, and AI enthusiasts looking to stand out.
βοΈ Legal Outcome Predictor: Personal Injury Cases
Create a classification model to forecast the verdict of personal injury lawsuits using past court rulings, case attributes, and structured legal datasets.
π’ Titanic Data Survival Analysis
Leverage the iconic Titanic dataset to conduct a detailed exploratory data analysis (EDA). Build classification models to determine passenger survival based on demographics and ticket details.
π Big Mart Sales Forecasting (2025 Edition)
Apply regression techniques to predict sales for various Big Mart stores. Use features like product type, store size, promotional discounts, and seasonal trends to improve accuracy.
π₯ Deep Learning for Potato Leaf Disease Detection
Develop a deep learning solution to identify diseases in potato leaves using image classification. Implement Convolutional Neural Networks (CNNs) to support precision agriculture.
β Real-Time Hand Gesture Recognition
Build an intelligent system that detects and classifies hand gestures using deep learning. Combine CNNs with RNNs or LSTMs for real-time gesture control applications.
π³ Smart Credit Card Fraud Detection
Design a fraud detection system that identifies suspicious transactions using anomaly detection and supervised learning. Emphasize real-time decision-making and financial security.
π‘οΈ Predicting Insurance Claim Severity
Model the expected severity of insurance claims using regression techniques. Analyze policyholder data, claim types, and incident history to estimate financial impact with high accuracy.