About

This project originated as a way to see if there was a difference in prediction power between old-school baseball pitcher statistics and the newer, SABR-style metrics (the results of models using different parameters can be seen below). I decided a logical extension of this would be a front end where game predictions, and their accuracies, are displayed.

To learn more about this effort, please visit the GitHub repo for this project to view source code, an architecture diagram, and further explanations.

Shortcomings

As it turns out, predicting the winner of a baseball game is hard. It's even harder if you only use half a dozen or so statistics regarding the starting pitchers. As one could imagine, these statistics hardly paint the whole picture of a baseball game -- I mean, it doesn't even take into account the batting of an opposing team, not to mention the hundreds of other factors that go into a baseball game. I recognize these shortcomings as well as the lackluster prediciton results. I have considered amending my data to have more features but I would like to stick using statistics from before the game starts.

Model Results

All Stats Model
Date Created	2024-01-28 20:21:48.047509
Model Type	SVC
Parameters Used	pitcher_era_comp, pitcher_win_percentage_comp, pitcher_win_comp, pitcher_losses_comp, pitcher_innings_pitched_comp, pitcher_k_nine_comp, pitcher_bb_nine_comp, pitcher_k_bb_diff_comp, pitcher_whip_comp, pitcher_babip_comp, pitcher_k_bb_ratio_comp
Parameter Count	11
Accuracy	0.573134328358209
Training Set Size	1338
Testing Set Size	335

Old-School Stats Model
Date Created	2024-01-28 20:21:48.047509
Model Type	KNeighborsClassifier
Parameters Used	pitcher_era_comp, pitcher_win_percentage_comp, pitcher_win_comp, pitcher_losses_comp, pitcher_innings_pitched_comp
Parameter Count	5
Accuracy	0.582089552238806
Training Set Size	1338
Testing Set Size	335

Modern Stats Model
Date Created	2024-01-28 20:21:48.047509
Model Type	NearestCentroid
Parameters Used	pitcher_k_nine_comp, pitcher_bb_nine_comp, pitcher_k_bb_diff_comp, pitcher_whip_comp, pitcher_babip_comp, pitcher_k_bb_ratio_comp
Parameter Count	6
Accuracy	0.5611940298507463
Training Set Size	1338
Testing Set Size	335