I have thought for a long time that Retrosheet.org was a pretty neat web site with the potential for a ton of baseball information.
They have loads of stats on their site, but it can be hard to get at sometimes. As a result, I thought I might do some coding against their data just for fun. (I am doing this mainly because I am bored, because the Steelers are not on TV here in Ohio today, and because cable does not carry the NFL Sunday Ticket.)
The first task is to take their event data files and get them into a format that can be easily read and parsed. There are zip files on the Retrosheet web site to download entire seasons worth of event data, just go to retrosheet.org, hover over Data downloads, and select Play-by-play files.
However, once you look at these event files, you discover that they may need a little interpreting to get them into a nicer format for study. Luckily, Tom Tippett, David Nichols, and David W. Smith wrote a DOS application that does this, you just need to run this BEVENT.EXE file on each of the event files.
And of course, I couldn’t simply do that, I had to write a VB.NET console application to create a batch file do automate this.
Console.Write("Enter the directory of your event files: ")
Dim d As String = Console.ReadLine()
If d.Trim = "" Or Not Directory.Exists(d) Then GoTo App_end
If Right(d, 1) <> "\" Then d = d + "\"
Dim eventFilenames As List(Of String)
eventFilenames = Directory.GetFiles(d, "*.ev?").ToList
Dim s, cmd As String
Dim sw As StreamWriter = New StreamWriter(d + "BEventHelper.bat")
For Each f In eventFilenames
s = Path.GetFileName(f)
cmd = "bevent.exe -f 0-96 -y " + Left(s, 4) + " " + s + " > " + s + ".csv"
Console.WriteLine("Press any key to end application...")
So what you do is download a season zip file (find them on the Play-by-Play Data Files page), unzip the files into a directory, download the bevent.zip file from the Software tools page, unzip it and place the BEVENT.EXE file into the directory with all the event files, run the console application and enter the directory name at the prompt, and when it is all said and done, you have a BEventHelper.bat file in the directory. Once this file is run, you will end up with CSV files that correspond to the event files.
In the next episode, I will begin to read in the data and crank out some preliminary statistics.
Disclaimer: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.