Tech Stack
Python
Machine Learning
Numpy
Pandas
Matplotlib
ScikitLearn
Description
This project focuses on predicting loan approval status using applicant financial and demographic data. A comprehensive machine learning pipeline was developed to preprocess data, train multiple ML models, and evaluate their performance.
- Designed a data preprocessing pipeline using Pandas: Handled missing values, Encoded categorical variables, Applied log transformations and normalized numerical features.
- Implemented and compared multiple ML models: Logistic Regression, Random Forest, Decision Tree, K-Nearest Neighbors (KNN), Support Vector Machine (SVM).
- Evaluated model performance using accuracy, F1-score, and confusion matrices with Scikit-learn.
- Conducted feature analysis and visualized: Feature importance (CreditHistory identified as the most critical predictor) & Model metrics, distributions, and relationships using Matplotlib and Seaborn.
- Achieved 78% accuracy using the Random Forest model, delivering a reliable tool for financial institutions to make data-driven loan approval decisions.