As the 2nd month of your Python data analysis journey unfolds, it becomes obvious you need to explore some more advanced analysis techniques. In this month, you focus on developing skills to use libraries like Scikit-learn for ML and working with real world problems. These are the two main projects, which we will work upon this month to make you proficient in data cleaning, feature engineering and Model building including detecting fake news & predicting stock prices based on market price.
Weeks 5 and 6: Fake News Detection
Aim: The first project for this month is to develop a fake news detection classification model. This is a text data task which needs some preprocessing and analytical procedure as well.
Project Breakdown
- Data Cleaning: Here you will begin by cleaning the text data to get rid of noise present in form of HTML tags, punctuation marks and stop words. Good data cleaning make the model more accurate.
- TF-IDF Vectorization: After the processing, it is time to represent this cleaned text in a form of the matrix which our machine learning model can understand. You will represent your text as numerical vectors using Term Frequency-Inverse Document Frequency (TF-IDF). Here words are discussed according to their importance in the documents.
- Model Building — This is the final step in which you use Scikit-learn to create a Logistic Regression model, one of the most popular algorithms used for binary classification problem. You will use a labeled dataset of news articles to train the model in order for it to become familiar with characteristics which help distinguish fake from real news.
Week 7 to Week 8: Stock Market Price Prediction
Problem 2: Our second project attempts to forecast stock market prices (a topic of widespread interest and usefulness) using techniques in data analysis, machine learning.
Project Breakdown:
- Step 1 – Data Gathering: Collect past stock price data. There are libraries like yfinance that you can use to get data at ease.
- Feature Engineering: After you get the data ready, its time for some feature engineering. In here you will create features relevant to the problem such as moving averages, daily returns, etc in order that might help predict stock prices.
- Model Development: You will learn to create regression models using Scikit-learn like Linear Regression or use more advanced techniques as Random Forest Regression. Then you will train your model to predict stock prices in the future based on historical data that is already available at this link.
Skills Gained:
- Time Series Analysis-Here we deal with analyzing data that are recorded at regular intervals.
- Here you will learn how to: Create new features that can enhance the model predictions.
- Regression Models: Learn the ins and outs of several regression techniques to predict models with.
- This work imparts not only the technical abilities of financial analysis but it also gives a deeper understanding of market dynamics and trends.
- Month 3: Capstone Project – Customer Churn Prediction
- On your way out of the third month, you will work on a larger project that brings all components together (and everything else we have learned thus far). We will instead focus on building an end-to-end solution for Customer Churn Prediction.
Month 3: Capstone Project – Customer Churn Prediction
Weeks 9–12: Customer churn prediction
Goal: The objective is to analyze the customer data in order to determine churning customers. Businesses may use this analysis to benefit insights on how they can better identify customer retention strategies.
Project Breakdown:
- Analyze the data: The first step is to analyze your customer data and identify important features that relate with user churn.
- Clean the Data: This includes handling missing data and getting your dataset ready for modeling.
- Identify the best features for prediction (Feature Selection)
- Model Development:- Building classification model using Scikit-learn and XGBoost, especially those around hyperparameter tuning & performance evaluation.
Skills Gained:
- Learn preparation and cleaning dataPrepare the maximum important features
- Machine Learning and Model Evaluation — Learn about different metrics such as precision, recall & F1 score to evaluate your model performance making sure of its robustness.
- XGBoost :One of the most powerful machine learning algorithm available, a must-have experience if you have not yet used it in competition and industry.
- This final project brings everything learned together and helps you get comfortable with your Data Skills so that when faced with a real data problem outside of this. Skills Covered: – Querying databases using SQL Exploratory data analysis Summarization Visualization Building prediction.
Project Submission Guidelines:
Once you have completed your projects, you are required to share either a video or snapshots showcasing your work on LinkedIn within the specified deadlines. Be sure to tag the official BiStartX LinkedIn page. You can find our profile here: BiStartX.
- Submission Process: After uploading your projects, submit the links to your LinkedIn post and GitHub repository via the provided submission form. Ensure that both links are public so we can verify your work.
- This is the final submission step—once you have completed all your projects, fill out the submission form.
Project Requirements:
- Complete a minimum of 3 assigned projects to qualify for the Internship Completion Certificate.
To receive a Letter of Recommendation, at least 4 projects must be completed.
Completing 4 projects is also required to earn swag and additional rewards, and all available shifts must be completed for eligibility.