
This project originated as a way to see if there was a difference in prediction power between old-school baseball pitcher statistics and the newer, SABR-style metrics (the results of models using different parameters can be seen below). I decided a logical extension of this would be a front end where game predictions, and their accuracies, are displayed.

To learn more about this effort, please visit the GitHub repo for this project to view source code, an architecture diagram, and further explanations.


As it turns out, predicting the winner of a baseball game is hard. It's even harder if you only use half a dozen or so statistics regarding the starting pitchers. As one could imagine, these statistics hardly paint the whole picture of a baseball game -- I mean, it doesn't even take into account the batting of an opposing team, not to mention the hundreds of other factors that go into a baseball game. I recognize these shortcomings as well as the lackluster prediciton results. I have considered amending my data to have more features but I would like to stick using statistics from before the game starts.

Model Results

All Stats Model
Date Created 2024-01-28 20:21:48.047509
Model Type SVC
Parameters Used pitcher_era_comp, pitcher_win_percentage_comp, pitcher_win_comp, pitcher_losses_comp, pitcher_innings_pitched_comp, pitcher_k_nine_comp, pitcher_bb_nine_comp, pitcher_k_bb_diff_comp, pitcher_whip_comp, pitcher_babip_comp, pitcher_k_bb_ratio_comp
Parameter Count 11
Accuracy 0.573134328358209
Training Set Size 1338
Testing Set Size 335
Old-School Stats Model
Date Created 2024-01-28 20:21:48.047509
Model Type KNeighborsClassifier
Parameters Used pitcher_era_comp, pitcher_win_percentage_comp, pitcher_win_comp, pitcher_losses_comp, pitcher_innings_pitched_comp
Parameter Count 5
Accuracy 0.582089552238806
Training Set Size 1338
Testing Set Size 335
Modern Stats Model
Date Created 2024-01-28 20:21:48.047509
Model Type NearestCentroid
Parameters Used pitcher_k_nine_comp, pitcher_bb_nine_comp, pitcher_k_bb_diff_comp, pitcher_whip_comp, pitcher_babip_comp, pitcher_k_bb_ratio_comp
Parameter Count 6
Accuracy 0.5611940298507463
Training Set Size 1338
Testing Set Size 335