Deuces wild???
As a baseball loving youth, I remember with great fondness watching and listening to baseball as called by the voice of baseball, Vin Scully, who is unquestionably one of the greatest baseball announcers.
One thing I can always remember him saying during his calls was “deuces wild”, which he used to describe a situation that occurs with a count of 2 balls, 2 strikes, and 2 outs.
So, I figured I would take a look at the raw statistics and see how often the deuces wild situation actually came up. Here is the code from my Program.cs file:
using System; using System.Collections.Generic; using System.Linq; namespace RetrosheetReader { class Program { static List<Team> teamList; static List<Player> playerList; static List<Event> eventList; const string DATA = "c:\\baseball_data\\"; static void Main(string[] args) { Console.WriteLine("Retrosheet Reader"); Console.WriteLine(); teamList = Team.GetTeamList(DATA); playerList = Player.GetPlayerList(DATA); eventList = Event.GetEventList(DATA); Console.WriteLine("Number of teams: " + teamList.Count().ToString()); Console.WriteLine("Number of players: " + playerList.Count().ToString()); Console.WriteLine("Number of events: " + eventList.Count().ToString()); int[, ,] pitchCount = new int[4, 3, 3]; int balls, strikes, totalPitches; totalPitches = 0; foreach (var ev in eventList) { balls = 0; strikes = 0; foreach (char c in ev.pitchSequence) { if (Functions.IsBallOrStrike(c)) { if (balls < 0 || balls > 3) { Console.WriteLine("Illegal number of balls (" + balls.ToString() + ") in pitch sequence " + ev.pitchSequence); } else { pitchCount[balls, strikes, ev.outs]++; totalPitches++; if (Functions.IsStrike(c)) { if (strikes == 2) { if (!Functions.IsFoul(c)) strikes++; } else strikes++; } else { balls++; } } } } } Console.WriteLine("Total pitches: " + totalPitches.ToString()); for (int o = 0; o < 3; o++) for (int s = 0; s < 3; s++) for (int b = 0; b < 4; b++) Console.WriteLine(String.Format("Total pitches on B{0}-S{1}-O{2}: {3,8} ({4,6:P})", b, s, o, pitchCount[b, s, o], pitchCount[b, s, o] * 1.0 / totalPitches)); Console.WriteLine(); Console.Write("Strike any key to end..."); Console.ReadKey(); } } } |
By the way, there is a new Functions.cs file in the project that contains utility functions, and I had to modify the Event.cs file to take into account the fact that the BEVENT application creates duplicate records for a batter if there is some kind of on-base event that happens in the middle of the at-bat, such as a stolen base or pick off.
The results? I ran the application with the 2008 season data, and found that there were 700,242 total pitches. The deuces wild situation happened on only 17,141 pitches, or 2.45%. Of course, the highest percentage occurred with 0 balls, 0 strikes, and 0 outs (the first pitch to any batters that bat in an inning before the first out is recorded, including the first batter of each inning) with 65,050 pitches, or 9.29%.
Here is the zipped up solution:
Love this code! I’m having a problem though at line 40 of program.cs. I think it’s the manner in which you increment the pitchcount multidimensional array, but I’m not sure. I do know that is where the error ocurrs. I have downloaded the 2012 data from retrosheet, and compiled your code and it get’s through all the data files, in fact here is the output.
Retrosheet Reader
Reading events in c:baseball_data2012ANA.EVA.csv
Reading events in c:baseball_data2012ARI.EVN.csv
Reading events in c:baseball_data2012ATL.EVN.csv
Reading events in c:baseball_data2012BAL.EVA.csv
Reading events in c:baseball_data2012BOS.EVA.csv
Reading events in c:baseball_data2012CHA.EVA.csv
Reading events in c:baseball_data2012CHN.EVN.csv
Reading events in c:baseball_data2012CIN.EVN.csv
Reading events in c:baseball_data2012CLE.EVA.csv
Reading events in c:baseball_data2012COL.EVN.csv
Reading events in c:baseball_data2012DET.EVA.csv
Reading events in c:baseball_data2012HOU.EVN.csv
Reading events in c:baseball_data2012KCA.EVA.csv
Reading events in c:baseball_data2012LAN.EVN.csv
Reading events in c:baseball_data2012MIA.EVN.csv
Reading events in c:baseball_data2012MIL.EVN.csv
Reading events in c:baseball_data2012MIN.EVA.csv
Reading events in c:baseball_data2012NYA.EVA.csv
Reading events in c:baseball_data2012NYN.EVN.csv
Reading events in c:baseball_data2012OAK.EVA.csv
Reading events in c:baseball_data2012PHI.EVN.csv
Reading events in c:baseball_data2012PIT.EVN.csv
Reading events in c:baseball_data2012SDN.EVN.csv
Reading events in c:baseball_data2012SEA.EVA.csv
Reading events in c:baseball_data2012SFN.EVN.csv
Reading events in c:baseball_data2012SLN.EVN.csv
Reading events in c:baseball_data2012TBA.EVA.csv
Reading events in c:baseball_data2012TEX.EVA.csv
Reading events in c:baseball_data2012TOR.EVA.csv
Reading events in c:baseball_data2012WAS.EVN.csv
Number of teams: 30
Number of players: 1407
Number of events: 184590
Then I get
System.IndexOutOfRangeException was unhandled
Message=Index was outside the bounds of the array.
Source=RetrosheetReader
StackTrace:
at RetrosheetReader.Program.Main(String[] args) in c:UsersjspattonDownloadsRetrosheetReaderRetrosheetReaderRetrosheetReaderProgram.cs:line 40
at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:
I was thinking that maybe this was due to a limitation of the int, so I’ve changed it to int64, long, double and I run into the same issue every time. So I was obviously wrong in that respect. So the next thing might be a difference between c# in 2009 and c# in 2013 which is where I’m leaning at the moment. If you could give me any advice I’d really appreciate it.
Thanks so much!
So just as an updated, I modified your code for declaring pitchcount from :
int[, ,] pitchCount = new int[4, 3, 3];
to :
int[, ,] pitchCount = new int[5, 3, 3];
And I now get your results…that seems odd to me, but perhaps they changed something in the data between your original post in ’09 and now.
I found the problem, there is some unexpected data being generated by the BEVENT program into the CSV files. At some point, the data contains a pitch sequence that trips up the code because there are some “ball” pitches after the 4th ball of an at-bat. After I changed the code (see above), this is what should show up:
. . .
. . .
. . .
Number of teams: 30
Number of players: 1407
Number of events: 184590
Illegal number of balls (4) in pitch sequence SBFBBBFX
Illegal number of balls (4) in pitch sequence SBFBBBFX
Total pitches: 689870
Total pitches on B0-S0-O0: 63824 (9.25 %)
. . .
. . .
. . .
If you are interested, you should be able to dive into the code at the point that it breaks and figure out what game is generating that pitch sequence, and then you could contact the Retrosheet guys and see if they have an explanation or a fix.
Jeff:
It looks like the San Diego home data for 2012 has a pitch sequence in it that at first glance must be some kind of typo. At line 12789 of the file 2012SDN.EVN, there is the following line:
play,4,1,venaw001,42,SBFBBBFX,7/F
As best that I can interpret this, Will Venable was walked in the 4th inning, but after the 4th ball, there was still entered a foul ball (“F”) and then a ball in play (“X”).