Datasets
Standard Dataset
MLB pitcher data
![](https://ieee-dataport.org/sites/default/files/styles/3x2/public/tags/images/system-2660914_1920.jpg?itok=YhW39ejC)
- Citation Author(s):
- Submitted by:
- Wonbyung Lee
- Last updated:
- Thu, 02/06/2025 - 04:20
- DOI:
- 10.21227/se93-a006
- Data Format:
- Links:
- License:
- Categories:
- Keywords:
Abstract
In this study, we analyzed the data on MLB pitchers data, using datasets derived from Statcast. Introduced in 2015, Statcast is an advanced tracking technology capable of capturing detailed information on pitch tracking, bat tracking, etc. in every MLB game. Currently, all 30 MLB ballparks are equipped with Statcast systems, and since the introduction of the Hawk-Eye technology in 2020, this system has achieved an impressive tracking rate of 99\% for batted balls. Statcast data include precise tracking details, such as pitch trajectories, spin rates, and other metrics, which can be segmented by pitch type and location. Given that the experiment focused on predicting the ERA of a pitcher, we selected features specifically related to pitching performance. The dataset was constructed based on data from the official MLB Statcast site and encompassed regular seasonal records from 2015 to 2023. Postseason games were excluded from the analysis.
These datasets are all composed of baseball-related variables and are either continuous or categorical variables. Appropriate normalization operations may be required for model training.