Posts tagged ‘Retrosheet’

Batter vs. Pitcher app released

Well, I have finally released my Batter vs. Pitcher app that I have been working on for months now. The app is all about baseball statistics, so if you are a fan of baseball, please check it out:

Batter vs. Pitcher

BTW, Happy Birthday to John Kundla, former NBA coaching great. (I am not an NBA fan, but I could not find anyone I wanted to mention on the Wikipedia site for July 3, and his entry was the oldest on the births list that had not passed away.)

Cool baseball app

It’s about time that somebody did something awesome with the retrosheet data…

Pennant for iPad (link removed, app is no longer available)

I wish I had an iPad.

BTW, happy birthday to Joel Hodgson, one of the true geniuses of our time.

Deuces wild???

As a baseball loving youth, I remember with great fondness watching and listening to baseball as called by the voice of baseball, Vin Scully, who is unquestionably one of the greatest baseball announcers.

One thing I can always remember him saying during his calls was “deuces wild”, which he used to describe a situation that occurs with a count of 2 balls, 2 strikes, and 2 outs.

So, I figured I would take a look at the raw statistics and see how often the deuces wild situation actually came up.  Here is the code from my Program.cs file:

using System;
using System.Collections.Generic;
using System.Linq;
 
namespace RetrosheetReader
{
    class Program
    {
        static List<Team> teamList;
        static List<Player> playerList;
        static List<Event> eventList;
 
        const string DATA = "c:\\baseball_data\\";
 
        static void Main(string[] args)
        {
            Console.WriteLine("Retrosheet Reader");
            Console.WriteLine();
 
            teamList = Team.GetTeamList(DATA);
            playerList = Player.GetPlayerList(DATA);
            eventList = Event.GetEventList(DATA);
 
            Console.WriteLine("Number of teams: " + teamList.Count().ToString());
            Console.WriteLine("Number of players: " + playerList.Count().ToString());
            Console.WriteLine("Number of events: " + eventList.Count().ToString());
 
            int[, ,] pitchCount = new int[4, 3, 3];
            int balls, strikes, totalPitches;
 
            totalPitches = 0;
            foreach (var ev in eventList)
            {
                balls = 0;
                strikes = 0;
                foreach (char c in ev.pitchSequence)
                {
                    if (Functions.IsBallOrStrike(c))
                    {
                        if (balls < 0 || balls > 3)
                        {
                            Console.WriteLine("Illegal number of balls (" + balls.ToString() + ") in pitch sequence " + ev.pitchSequence);
                        }
                        else
                        {
                            pitchCount[balls, strikes, ev.outs]++;
                            totalPitches++;
                            if (Functions.IsStrike(c))
                            {
                                if (strikes == 2)
                                {
                                    if (!Functions.IsFoul(c))
                                        strikes++;
                                }
                                else
                                    strikes++;
                            }
                            else
                            {
                                balls++;
                            }
                        }
                    }
                }
            }
 
            Console.WriteLine("Total pitches: " + totalPitches.ToString());
            for (int o = 0; o < 3; o++)
                for (int s = 0; s < 3; s++)
                    for (int b = 0; b < 4; b++)
                        Console.WriteLine(String.Format("Total pitches on B{0}-S{1}-O{2}: {3,8}  ({4,6:P})", b, s, o, pitchCount[b, s, o],
                                                    pitchCount[b, s, o] * 1.0 / totalPitches));
 
            Console.WriteLine();
            Console.Write("Strike any key to end...");
            Console.ReadKey();
        }
    }
}

By the way, there is a new Functions.cs file in the project that contains utility functions, and I had to modify the Event.cs file to take into account the fact that the BEVENT application creates duplicate records for a batter if there is some kind of on-base event that happens in the middle of the at-bat, such as a stolen base or pick off.

The results? I ran the application with the 2008 season data, and found that there were 700,242 total pitches. The deuces wild situation happened on only 17,141 pitches, or 2.45%. Of course, the highest percentage occurred with 0 balls, 0 strikes, and 0 outs (the first pitch to any batters that bat in an inning before the first out is recorded, including the first batter of each inning) with 65,050 pitches, or 9.29%.

Here is the zipped up solution:

RetrosheetReader.zip

Retrosheet event record mapping completed

The Retrosheet project has kind of sat around for a while untouched, so I figured I would finish up the reading of the event records. I have decided to break the project into separate files, as if I kept it in one file, it would be obscenely long.

So, here is the main Program.cs file:

Program.cs

using System;
using System.Collections.Generic;
using System.Linq;
 
namespace RetrosheetReader
{
    class Program
    {
        static List<Team> teamList;
        static List<Player> playerList;
        static List<Event> eventList;
 
        const string DATA = "c:\\baseball_data\\";
 
        static void Main(string[] args)
        {
            Console.WriteLine("Retrosheet Reader");
            Console.WriteLine();
 
            teamList = Team.GetTeamList(DATA);
            playerList = Player.GetPlayerList(DATA);
            eventList = Event.GetEventList(DATA);
 
            Console.WriteLine("Number of teams: " + teamList.Count().ToString());
            Console.WriteLine("Number of players: " + playerList.Count().ToString());
            Console.WriteLine("Number of events: " + eventList.Count().ToString());
 
            Console.WriteLine();
            Console.Write("Strike any key to end...");
            Console.ReadKey();
        }
    }
}

There are now 3 class files, one each for a team, player, and event. The event file is the one with the most new stuff going on.

Team.cs

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
 
namespace RetrosheetReader
{
    class Team
    {
        int year;
        string city;
        string nickname;
        string abbreviation;
        string league;
 
        public Team(int y, string[] a)
        {
            year = y;
            abbreviation = a[0];
            league = a[1];
            city = a[2];
            nickname = a[3];
        }
 
        public static List<Team> GetTeamList(string dir)
        {
            int y;
            string s;
            string[] splitLine;
            List<Team> teamList = new List<Team>();
 
            List<string> tfs = Directory.GetFiles(dir, "team*").ToList();
            foreach (string tf in tfs)
            {
                y = Convert.ToInt32(Path.GetFileName(tf).Substring(4));
                StreamReader sr = new StreamReader(tf);
                while ((s = sr.ReadLine()) != null)
                {
                    splitLine = s.Split(',');
                    if (splitLine.Count() == 4)
                    {
                        teamList.Add(new Team(y, splitLine));
                    }
                }
            }
 
            return teamList;
        }
    }
}

Player.cs

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
 
namespace RetrosheetReader
{
    class Player
    {
        int year;
        string team;
        string playerID;
        string firstName;
        string lastName;
        string bats;
        string throws;
        string position;
 
        public Player(int y, string[] a)
        {
            year = y;
            playerID = a[0];
            lastName = a[1];
            firstName = a[2];
            bats = a[3];
            throws = a[4];
            team = a[5];
            position = a[6];
        }
 
        public static List<Player> GetPlayerList(string dir)
        {
            int y;
            string s;
            string[] splitLine;
            List<Player> playerList = new List<Player>();
 
            List<string> rfs = Directory.GetFiles(dir, "*.ros").ToList();
            foreach (string rf in rfs)
            {
                y = Convert.ToInt32(Path.GetFileName(rf).Substring(3).Split('.')[0]);
                StreamReader sr = new StreamReader(rf);
                while ((s = sr.ReadLine()) != null)
                {
                    splitLine = s.Split(',');
                    if (splitLine.Count() == 7)
                    {
                        playerList.Add(new Player(y, splitLine));
                    }
                }
            }
 
            return playerList;
        }
    }
}

Event.cs

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
 
namespace RetrosheetReader
{
    class Event
    {
        string gameID;          // 0, A
        string visitingTeam;    // 1, B
        int inning;             // 2, C
        int battingTeam;        // 3, D
        int outs;               // 4, E
        int balls;              // 5, F
        int strikes;            // 6, G
        string pitchSequence;   // 7, H
        int visitorScore;       // 8, I
        int homeScore;          // 9, J
        string batter;          // 10, K
        string batterHand;      // 11, L
        string resBatter;       // 12, M
        string resBatterHand;   // 13, N
        string pitcher;         // 14, O
        string pitcherHand;     // 15, P
        string resPitcher;      // 16, Q
        string resPitcherHand;  // 17, R
        string catcher;         // 18, S
        string firstBase;       // 19, T
        string secondBase;      // 20, U
        string thirdBase;       // 21, V
        string shortstop;       // 22, W
        string leftField;       // 23, X
        string centerField;     // 24, Y
        string rightField;      // 25, Z
        string firstRunner;     // 26, AA
        string secondRunner;    // 27, AB
        string thirdRunner;     // 28, AC
        string eventText;       // 29, AD
        bool leadoffFlag;       // 30, AE
        bool pinchHitFlag;      // 31, AF
        int defensivePosition;  // 32, AG
        int lineupPosition;     // 33, AH
        int eventType;          // 34, AI
        bool batterEventFlag;   // 35, AJ
        bool abFlag;            // 36, AK
        int hitValue;           // 37, AL
        bool shFlag;            // 38, AM
        bool sfFlag;            // 39, AN
        int outsOnPlay;         // 40, AO
        bool doublePlayFlag;    // 41, AP
        bool triplePlayFlag;    // 42, AQ
        int rbiOnPlay;          // 43, AR
        bool wildPitchFlag;     // 44, AS
        bool passedBallFlag;    // 45, AT
        int fieldedBy;          // 46, AU
        string battedBallType;  // 47, AV
        bool buntFlag;          // 48, AW
        bool foulFlag;          // 49, AX
        string hitLocation;     // 50, AY
        int numErrors;          // 51, AZ
        int firstErrorPlayer;   // 52, BA
        string firstErrorType;  // 53, BB
        int secondErrorPlayer;  // 54, BC
        string secondErrorType; // 55, BD
        int thirdErrorPlayer;   // 56, BE
        string thirdErrorType;  // 57, BF
        int batterDest;         // 58, BG
        int runner1Dest;        // 59, BH
        int runner2Dest;        // 60, BI
        int runner3Dest;        // 61, BJ
        string playOnBatter;    // 62, BK
        string playOnRunner1;   // 63, BL
        string playOnRunner2;   // 64, BM
        string playOnRunner3;   // 65, BN
        bool sbRunner1Flag;     // 66, BO
        bool sbRunner2Flag;     // 67, BP
        bool sbRunner3Flag;     // 68, BQ
        bool csRunner1Flag;     // 69, BR
        bool csRunner2Flag;     // 70, BS
        bool csRunner3Flag;     // 71, BT
        bool poRunner1Flag;     // 72, BU
        bool poRunner2Flag;     // 73, BV
        bool poRunner3Flag;     // 74, BW
        string respPitcher1;    // 75, BX
        string respPitcher2;    // 76, BY
        string respPitcher3;    // 77, BZ
        bool newGameFlag;       // 78, CA
        bool endGameFlag;       // 79, CB
        bool pinchRunner1;      // 80, CC
        bool pinchRunner2;      // 81, CD
        bool pinchRunner3;      // 82, CE
        string removedForPR1;   // 83, CF
        string removedForPR2;   // 84, CG
        string removedForPR3;   // 85, CH
        string removedForPH;    // 86, CI
        int posRemovedForPH;    // 87, CJ
        int fielderWithPO1;     // 88, CK
        int fielderWithPO2;     // 89, CL
        int fielderWithPO3;     // 90, CM
        int fielderWithA1;      // 91, CN
        int fielderWithA2;      // 92, CO
        int fielderWithA3;      // 93, CP
        int fielderWithA4;      // 94, CQ
        int fielderWithA5;      // 95, CR
        int eventNum;           // 96, CS
 
        public Event(string[] a)
        {
            gameID = a[0].Replace("\"", "");
            visitingTeam = a[1].Replace("\"", "");
            inning = Convert.ToInt32(a[2]);
            battingTeam = Convert.ToInt32(a[3]);
            outs = Convert.ToInt32(a[4]);
            balls = Convert.ToInt32(a[5]);
            strikes = Convert.ToInt32(a[6]);
            pitchSequence = a[7].Replace("\"", "");
            visitorScore = Convert.ToInt32(a[8]);
            homeScore = Convert.ToInt32(a[9]);
            batter = a[10].Replace("\"", "");
            batterHand = a[11].Replace("\"", "");
            resBatter = a[12].Replace("\"", "");
            resBatterHand = a[13].Replace("\"", "");
            pitcher = a[14].Replace("\"", "");
            pitcherHand = a[15].Replace("\"", "");
            resPitcher = a[16].Replace("\"", "");
            resPitcherHand = a[17].Replace("\"", "");
            catcher = a[18].Replace("\"", "");
            firstBase = a[19].Replace("\"", "");
            secondBase = a[20].Replace("\"", "");
            thirdBase = a[21].Replace("\"", "");
            shortstop = a[22].Replace("\"", "");
            leftField = a[23].Replace("\"", "");
            centerField = a[24].Replace("\"", "");
            rightField = a[25].Replace("\"", "");
            firstRunner = a[26].Replace("\"", "");
            secondRunner = a[27].Replace("\"", "");
            thirdRunner = a[28].Replace("\"", "");
            eventText = a[29].Replace("\"", "");
            leadoffFlag = a[30].Contains('T');
            pinchHitFlag = a[31].Contains('T');
            defensivePosition = Convert.ToInt32(a[32]);
            lineupPosition = Convert.ToInt32(a[33]);
            eventType = Convert.ToInt32(a[34]);
            batterEventFlag = a[35].Contains('T');
            abFlag = a[36].Contains('T');
            hitValue = Convert.ToInt32(a[37]);
            shFlag = a[38].Contains('T');
            sfFlag = a[39].Contains('T');
            outsOnPlay = Convert.ToInt32(a[40]);
            doublePlayFlag = a[41].Contains('T');
            triplePlayFlag = a[42].Contains('T');
            rbiOnPlay = Convert.ToInt32(a[43]);
            wildPitchFlag = a[44].Contains('T');
            passedBallFlag = a[45].Contains('T');
            fieldedBy = Convert.ToInt32(a[46]);
            battedBallType = a[47].Replace("\"", "");
            buntFlag = a[48].Contains('T');
            foulFlag = a[49].Contains('T');
            hitLocation = a[50].Replace("\"", "");
            numErrors = Convert.ToInt32(a[51]);
            firstErrorPlayer = Convert.ToInt32(a[52]);
            firstErrorType = a[53].Replace("\"", "");
            secondErrorPlayer = Convert.ToInt32(a[54]);
            secondErrorType = a[55].Replace("\"", "");
            thirdErrorPlayer = Convert.ToInt32(a[56]);
            thirdErrorType = a[57].Replace("\"", "");
            batterDest = Convert.ToInt32(a[58]);
            runner1Dest = Convert.ToInt32(a[59]);
            runner2Dest = Convert.ToInt32(a[60]);
            runner3Dest = Convert.ToInt32(a[61]);
            playOnBatter = a[62].Replace("\"", "");
            playOnRunner1 = a[63].Replace("\"", "");
            playOnRunner2 = a[64].Replace("\"", "");
            playOnRunner3 = a[65].Replace("\"", "");
            sbRunner1Flag = a[66].Contains('T');
            sbRunner2Flag = a[67].Contains('T');
            sbRunner3Flag = a[68].Contains('T');
            csRunner1Flag = a[69].Contains('T');
            csRunner2Flag = a[70].Contains('T');
            csRunner3Flag = a[71].Contains('T');
            poRunner1Flag = a[72].Contains('T');
            poRunner2Flag = a[73].Contains('T');
            poRunner3Flag = a[74].Contains('T');
            respPitcher1 = a[75].Replace("\"", "");
            respPitcher2 = a[76].Replace("\"", "");
            respPitcher3 = a[77].Replace("\"", "");
            newGameFlag = a[78].Contains('T');
            endGameFlag = a[79].Contains('T');
            pinchRunner1 = a[80].Contains('T');
            pinchRunner2 = a[81].Contains('T');
            pinchRunner3 = a[82].Contains('T');
            removedForPR1 = a[83].Replace("\"", "");
            removedForPR2 = a[84].Replace("\"", "");
            removedForPR3 = a[85].Replace("\"", "");
            removedForPH = a[86].Replace("\"", "");
            posRemovedForPH = Convert.ToInt32(a[87]);
            fielderWithPO1 = Convert.ToInt32(a[88]);
            fielderWithPO2 = Convert.ToInt32(a[89]);
            fielderWithPO3 = Convert.ToInt32(a[90]);
            fielderWithA1 = Convert.ToInt32(a[91]);
            fielderWithA2 = Convert.ToInt32(a[92]);
            fielderWithA3 = Convert.ToInt32(a[93]);
            fielderWithA4 = Convert.ToInt32(a[94]);
            fielderWithA5 = Convert.ToInt32(a[95]);
            eventNum = Convert.ToInt32(a[96]);
        }
 
        public static List<Event> GetEventList(string dir)
        {
            string s;
            string[] splitLine;
            List<Event> eventList = new List<Event>();
 
            List<string> efs = Directory.GetFiles(dir, "*.csv").ToList();
            foreach (string ef in efs)
            {
                Console.WriteLine("Reading events in " + ef);
                StreamReader sr = new StreamReader(ef);
                while ((s = sr.ReadLine()) != null)
                {
                    splitLine = s.Split(',');
                    if (splitLine.Count() == 97)
                    {
                        eventList.Add(new Event(splitLine));
                    }
                }
            }
 
            return eventList;
        }
    }
}

I have zipped up the solution if you would like to work with it:

RetrosheetReader.zip

Next up will come some actual analysis of the data, now that it is completely being read in and stored.

Retrosheet reader

OK, so you just got back from the Ohio LinuxFest 2009  (meh, not exactly a hoppin’ place this year, I was kind of disappointed), and you have your event files converted over to CSV files after you generated the BEVENT batch files per last weekend’s post.

Now, you would like to read in that data and start looking through it for anything useful. Well this weekend’s post will help you with the reading in part, and you can take it from there if you would like.

Here is the C# console code. I have put all of the CSV files, ROS (roster) files, and the TEAM???? file into a folder on my C: drive called baseball_data, if your folder is different then just change the constant defined in the code below:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
 
namespace RetrosheetReader
{
 
    class Team
    {
        int year;
        string city;
        string nickname;
        string abbreviation;
        string league;
 
        public Team(int y, string[] a)
        {
            year = y;
            abbreviation = a[0];
            league = a[1];
            city = a[2];
            nickname = a[3];
        }
    }
 
    class Player
    {
        int year;
        string team;
        string playerID;
        string firstName;
        string lastName;
        string bats;
        string throws;
        string position;
 
        public Player(int y, string[] a)
        {
            year = y;
            playerID = a[0];
            lastName = a[1];
            firstName = a[2];
            bats = a[3];
            throws = a[4];
            team = a[5];
            position = a[6];
        }
    }
 
    class Event
    {
        string gameID;
        string visitingTeam;
        int inning;
        string battingTeam;
        int outs;
        int balls;
        int strikes;
        // yeah, there is still some work left here to do
        // maybe next time
 
        public Event(string[] a)
        {
            gameID = a[0];
            visitingTeam = a[1];
            inning = Convert.ToInt32(a[2]);
            battingTeam = a[3];
            outs = Convert.ToInt32(a[4]);
            balls = Convert.ToInt32(a[5]);
            strikes = Convert.ToInt32(a[6]);
        }
    }
 
    class Program
    {
        const string DATA = "c:\\baseball_data\\";
 
        static void Main(string[] args)
        {
            List teamList = new List();
            List playerList = new List();
            List eventList = new List();
 
            string s;
            string[] splitLine;
            int y;
 
            Console.WriteLine("Retrosheet Reader");
            Console.WriteLine();
 
            List tfs = Directory.GetFiles(DATA, "team*").ToList();
            foreach (string tf in tfs)
            {
                y = Convert.ToInt32(Path.GetFileName(tf).Substring(4));
                StreamReader sr = new StreamReader(tf);
                while ((s = sr.ReadLine()) != null)
                {
                    splitLine = s.Split(',');
                    if (splitLine.Count() == 4)
                    {
                        teamList.Add(new Team(y, splitLine));
                    }
                }
            }
 
            List rfs = Directory.GetFiles(DATA, "*.ros").ToList();
            foreach (string rf in rfs)
            {
                y = Convert.ToInt32(Path.GetFileName(rf).Substring(3).Split('.')[0]);
                StreamReader sr = new StreamReader(rf);
                while ((s = sr.ReadLine()) != null)
                {
                    splitLine = s.Split(',');
                    if (splitLine.Count() == 7)
                    {
                        playerList.Add(new Player(y, splitLine));
                    }
                }
            }
 
            List efs = Directory.GetFiles(DATA, "*.csv").ToList();
            foreach (string ef in efs)
            {
                Console.WriteLine("Reading events in " + ef);
                StreamReader sr = new StreamReader(ef);
                while ((s = sr.ReadLine()) != null)
                {
                    splitLine = s.Split(',');
                    if (splitLine.Count() == 97)
                    {
                        eventList.Add(new Event(splitLine));
                    }
                }
            }
 
            Console.WriteLine("Number of teams: " + teamList.Count().ToString());
            Console.WriteLine("Number of players: " + playerList.Count().ToString());
            Console.WriteLine("Number of events: " + eventList.Count().ToString());
 
            Console.WriteLine();
            Console.Write("Strike any key to end...");
            Console.ReadKey();
        }
    }
}

I have an idea as to the first thing that I am going to look for in the Retrosheet data, so tune in next weekend and I will (hopefully) have some interesting insights.

Retrosheet.org play-by-play baseball data

I have thought for a long time that Retrosheet.org was a pretty neat web site with the potential for a ton of baseball information.

They have loads of stats on their site, but it can be hard to get at sometimes.  As a result, I thought I might do some coding against their data just for fun. (I am doing this mainly because I am bored, because the Steelers are not on TV here in Ohio today, and because cable does not carry the NFL Sunday Ticket.)

The first task is to take their event data files and get them into a format that can be easily read and parsed. There are zip files on the Retrosheet web site to download entire seasons worth of event data, just go to retrosheet.org, hover over Data downloads, and select Play-by-play files.

However, once you look at these event files, you discover that they may need a little interpreting to get them into a nicer format for study. Luckily, Tom Tippett, David Nichols, and David W. Smith wrote a DOS application that does this, you just need to run this BEVENT.EXE file on each of the event files.

And of course, I couldn’t simply do that, I had to write a VB.NET console application to create a batch file do automate this.

Imports System.IO
 
Module Module1
 
    Sub Main()
 
        Console.WriteLine("BEvent Helper")
        Console.Write("Enter the directory of your event files: ")
        Dim d As String = Console.ReadLine()
 
        If d.Trim = "" Or Not Directory.Exists(d) Then GoTo App_end
        If Right(d, 1) <> "\" Then d = d + "\"
 
        Dim eventFilenames As List(Of String)
        eventFilenames = Directory.GetFiles(d, "*.ev?").ToList
 
        Dim s, cmd As String
        Dim sw As StreamWriter = New StreamWriter(d + "BEventHelper.bat")
        For Each f In eventFilenames
            s = Path.GetFileName(f)
            cmd = "bevent.exe -f 0-96 -y " + Left(s, 4) + " " + s + " > " + s + ".csv"
            sw.WriteLine(cmd)
        Next
        sw.WriteLine("pause")
        sw.Close()
 
App_end:
        Console.WriteLine("Press any key to end application...")
        Console.ReadKey()
 
    End Sub
 
End Module

So what you do is download a season zip file (find them on the Play-by-Play Data Files page), unzip the files into a directory, download the bevent.zip file from the Software tools page, unzip it and place the BEVENT.EXE file into the directory with all the event files, run the console application and enter the directory name at the prompt, and when it is all said and done, you have a BEventHelper.bat file in the directory. Once this file is run, you will end up with CSV files that correspond to the event files.

In the next episode, I will begin to read in the data and crank out some preliminary statistics.

Disclaimer: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.