Project Analysis
FIFA 19 Player Value Prediction & Recruitment Strategy
About Project
We decided to analyze the players' data from the game FIFA 19. There is a lot of several players from different countries, playing in different competitions. Their abilities in the game should reflect their real-world skills. The game's creators are about its efforts by creating game attributes, such as sprint speed, force bullets, endings or headers that we can express numerically. In the project, we will try to classify players into a game room based on these attributes, and positions and predict their market value in the game. Since the game does not always show players the market value of the player, but only its attributes, our model can be useful in determining the amount offered for the transfer of the player when negotiating in the game. When in the real world the abilities of the players (e.g., ending) do not bear any numerical. We will try to use the fact that in the game of expression, we can find out which attributes are important for certain gaming positions.
Problem Statement
Football clubs face the challenge of selecting the right players from a global pool with varying skill levels, prices, and growth potential. Relying solely on scouts and intuition can lead to overpriced or underperforming signings. Clubs like Manchester United need a data-driven approach to:
- Predict accurate player market values
- Identify high-potential, affordable players
- Support recruitment with concrete metrics
Objective
- Analyze the FIFA 19 dataset to identify patterns in player performance and valuation.
- Build a machine learning model to predict market value of players.
- Use the model to assist Manchester United in selecting 5 new signings based on specific criteria: young age, high potential, and affordability.
- Cleaned and transformed the FIFA dataset (89 columns, 18k+ players)
- Engineered new features like Potential Gap
- Transformed monetary columns: Value, Wage, Release Clause
- Linear Regression (with and without PCA)
- Random Forest
- Gradient Boosting
- XGBoost (multiple variants)
- Support Vector Machines
- Age less than 27
- Overall rating greater than 80
- Positive Potential Gap
- Value and Release Clause under €80M
- Matched top players to Manchester United’s position needs: RW, LB, RM, CM, CB
- Linear Regression
- Gradient Boosting
- XGBoost
- Random Forest
- Support Vector Machines
- Principal Component Analysis (PCA)
- Box-Cox Transformation
- Cross-validation
- Feature Selection
- FIFA 19 player dataset
- 18,207 rows
- 89 features
- Complex formatting in monetary features (symbols like €, M, K)
- Data sparsity in categorical fields like position and club
- Need for dummification and scaling before applying models
- High-dimensional correlation and multicollinearity management
- XGBoost and SVM were computationally intensive on the full dataset
- Removed redundant/low-variance columns
- Eliminated irrelevant text and image-related data
- Converted "Value" and "Wage" from string to numeric format
- Created Potential Gap = Potential - Overall
- Performed PCA to reduce features from 89 to 47
- Applied correlation filtering to remove highly collinear variables
- Used
caretpackage for model training with 10-fold cross-validation - Compared models using RMSE as the performance metric
- Shortlisted 5 ideal signings based on predicted value, availability, and performance fit
- XGBoost (Model 4) achieved the lowest RMSE: 0.7217
- T. Werner – RW
- F. Thauvin – RM
- Jorginho – CM
- D. Alaba – LB
- N. Süle – CB
- Total Budget: €186M
- Enabled data-driven recruitment decisions
- Helped avoid high-fee, high-risk player signings