Project Analysis
HR Salary Prediction
About Project
The HR salary dataset from Kaggle is a valuable resource for analyzing employee compensation trends. It contains information on employee demographics, job-related details, and compensation figures. This data can be used to identify factors that influence salary, such as experience, education, and job title, and assess organisational pay equity.
Problem Statement
TCS HR Dashboard Requirements
- Predict salaries of new hires or internal transfers
- Justify salary revisions based on experience, skills, and performance
- Ensure consistent and fair compensation using analytics
Objective
Project Objectives
- Create a machine learning model that predicts employee salaries
- Build an interactive web dashboard that HR managers can access
- Improve prediction accuracy through feature engineering and model tuning
Proposed Solution
Data Summary
- Used an HR dataset with 311 entries and 38 features
- Cleaned and reduced the dataset to 8 key features (e.g., age, experience, department, race, project count, performance score)
Modeling
- Trained and compared multiple models:
- Linear Regression
- Random Forest
- Gradient Boosting (with and without hyperparameter tuning)
Deployment
- Built and hosted a Flask-based interactive web dashboard on PythonAnywhere
Technologies Used
Languages
- Python
Libraries
- Pandas
- Scikit-learn
- GridSearchCV
- Matplotlib
Frameworks
- Flask
- HTML (for dashboard)
Deployment
- PythonAnywhere
Tools
- Google Colab
- Spyder
Challenges Faced
- Small dataset size made overfitting a concern
- Target variable (salary) was skewed, requiring transformation
- Feature selection was non-trivial due to missing or redundant fields
- Needed to extract derived features like age and experience from DOB, hire/termination dates
Methodology
Business Understanding
- Identify salary influencers such as experience, age, and performance
Data Cleaning
- Filled missing values (e.g., termination date) using the current date
- Removed uninformative columns (e.g., Employee ID, Manager Name)
Feature Engineering
- Extracted Age from DOB and Experience from Date of Hire
- Encoded categorical variables
Exploratory Data Analysis (EDA)
- Used correlation heatmaps and scatter plots
- Confirmed project count and performance score had the highest influence on salary
Modeling
- Trained models using train/test split
- Applied target transformation to handle salary skew
- Tuned Random Forest and Gradient Boosting using GridSearchCV
Deployment
- Created HTML forms and Flask routes
- Hosted the app on PythonAnywhere
Result / Outcome
Best Model
- Tuned Gradient Boosting
Performance
- R² Score: 0.76 (acceptable range for business use)
Business Impact
- TCS HR can now predict salaries with improved accuracy and transparency