Downloading The Data

Baseball Scoreboard
Photo by Jason Weingardt on Unsplash

Data Download Tool

Our dataset of pitches is available for anyone’s use with our Data Download Tool. The pitches data is the main source of analysis and modeling for the Strike Zone Explorer.

The main portion of the data is from Baseball Savant and it is joined with data from other sources to provide complete umpire and player data as described above. Some data sources are throttled to limit the amount of data which can be downloaded at once and, unfortunately, no errors or warnings are given when downloaded data is not complete.

It was a significant amount of work to download and clean the data effectively and we hope baseball fans and data scientists can benefit from our work. The data can be downloaded in full years or can be selected in combinations of years, pitches, batters, and umpires.

Full years of data are served up from pre-built ZIP files of CSVs. Three files are provided for each year for smaller download files and reduced errors.

When individual players, pitchers, or umpires are selected, a SQL query to our AWS database is made and a link is provided to download the CSV file.

Screen capture of Data Download Tool
Data Download Tool with example selection of data.

 

We hope you use the Data Download Tool for your own baseball analytics work. We hope to increase the amount of data available over time and will keep you informed of our work.

In the meantime, please take a look at the last of our BaseballML.com launch posts discussing next steps and lessons learned.