This project originated as a way to see if there was a difference in prediction power between old-school baseball pitcher statistics and the newer, SABR-style metrics (the results of models using different parameters can be seen below). I decided a logical extension of this would be a front end where game predictions, and their accuracies, are displayed.
To learn more about this effort, please visit the GitHub repo for this project to view source code, an architecture diagram, and further explanations.
As it turns out, predicting the winner of a baseball game is hard. It's even harder if you only use half a dozen or so statistics regarding the starting pitchers. As one could imagine, these statistics hardly paint the whole picture of a baseball game -- I mean, it doesn't even take into account the batting of an opposing team, not to mention the hundreds of other factors that go into a baseball game. I recognize these shortcomings as well as the lackluster prediciton results. I have considered amending my data to have more features but I would like to stick using statistics from before the game starts.
All Stats Model | |
---|---|
Date Created | 2024-01-28 20:21:48.047509 |
Model Type | SVC |
Parameters Used | pitcher_era_comp, pitcher_win_percentage_comp, pitcher_win_comp, pitcher_losses_comp, pitcher_innings_pitched_comp, pitcher_k_nine_comp, pitcher_bb_nine_comp, pitcher_k_bb_diff_comp, pitcher_whip_comp, pitcher_babip_comp, pitcher_k_bb_ratio_comp |
Parameter Count | 11 |
Accuracy | 0.573134328358209 |
Training Set Size | 1338 |
Testing Set Size | 335 |
Old-School Stats Model | |
---|---|
Date Created | 2024-01-28 20:21:48.047509 |
Model Type | KNeighborsClassifier |
Parameters Used | pitcher_era_comp, pitcher_win_percentage_comp, pitcher_win_comp, pitcher_losses_comp, pitcher_innings_pitched_comp |
Parameter Count | 5 |
Accuracy | 0.582089552238806 |
Training Set Size | 1338 |
Testing Set Size | 335 |
Modern Stats Model | |
---|---|
Date Created | 2024-01-28 20:21:48.047509 |
Model Type | NearestCentroid |
Parameters Used | pitcher_k_nine_comp, pitcher_bb_nine_comp, pitcher_k_bb_diff_comp, pitcher_whip_comp, pitcher_babip_comp, pitcher_k_bb_ratio_comp |
Parameter Count | 6 |
Accuracy | 0.5611940298507463 |
Training Set Size | 1338 |
Testing Set Size | 335 |