(Mis)uses of Numbers in Argument: Sabermetrics Revisited

I wanted to clarify some areas which might have been unclear concerning my explanation of Sabermetrics and specifically BABIP as a factor in the explanation. Sabermetrics is a form of statistical analysis that is used in computing baseball statistics. In class we discussed Nate Silver and how he weights different polls in order to conduct what is known as a meta analysis. A meta analysis is a way to focus and combine statistics from different studies, in class we used Silver’s polling to discuss examples, in order to examine the results as a whole (Wikipedia). A weight in statistical terms is defined as a coefficient assigned to a number in a computation, for example when determining an average, to make the number's effect on the computation reflect its importance (Eurostat). This is the definition; however, let us discuss why it is important to us as consumers of statistics. Like we have been discussing in class weighting is important if you would like to know how a particular variable is represented in the population. For example on Tuesday we discussed the question of if the number of homicides was disproportionate compared to the total population. In order to do this we had to find the weight. This is why weighting is important and how it is related to what we have discussed in class.

Batting Average on Balls in Play is an example of a weighted statistic. In an incredibly simplistic explanation we can say that this is a measure of how many hits a player is getting or as is being used in our case how many hits a pitcher is giving up (this is the reverse of what is being observed) [thank you Professor Oyen]. To be fair and accurate in my explanation of the reasoning behind the usage of BABIP in sabermetrics we must understand that “in play” means that the ball is hit fair and is being used to further the game in some statistically significant way and if not the ball would be considered foul or “not in play.”

So the question remains how do we compute this and why is each variable important to the equation? The formula for BABIP=(H-HR)/(AB-K-HR+SF). With H=hits, HR=home runs, AB=at bats, K= strike outs, and SF= sacrifice flies. A hit occurs when the player makes contact with the ball and reaches base. This variable is significant because for a player it signifies that they have safely reached first base. A hit is important for the converse reason to the pitcher because it signifies that a batter has successfully hit a pitch thrown by that pitcher. A home run is a hit that allows a player to complete the circuit of bases. This is important for a batter because it can mean that they can score one run and as many as four. Scoring is a way of determining the number of points in a game and helps to define who won. A home run is significant to the pitcher for different reasons and we will discuss this more in detail when we discuss the reasoning behind the variables for the calculation of the BABIP. At bat is how many times a player has stood at the plate (home plate) with a chance of achieving a hit. For pitchers it is interesting because it focuses on the number of batters which they will face in an inning. A strike out is when the pitcher throws the ball and the player who is “at bat” fails to achieve a hit three times which results in the batter being sent back to their dugout. For pitchers this can be important in traditional baseball statistics; however, not for BABIP which we will discuss in the section on weights. Finally sacrifice flies are a bit more complicated than the rest of these for this I will provide a verbatim definition from Wikipedia.

“In baseball, a sacrifice fly is a batted ball that satisfies four criteria: There are fewer than two outs when the ball is hit. The ball is hit to the outfield (fair or foul), or to infield foul territory. The batter is put out because an outfielder (or an infielder running in the outfield, or foul territory) catches the ball on the fly (alternatively if the batter would have been out if not for an error or if the outfielder drops the ball and another runner is put out). A runner who is already on base scores on the play. It is called a "sacrifice" fly because the batter presumably intends to cause a teammate to score a run, while sacrificing his own ability to do so.”

Sacrifice flies are important because they calculate a player’s defensive skill and willingness to give up their own ability to score to allow a teammate to score. These are important for pitchers because it would signify that there is a runner who can advance on base.

Now that we are aware of what all of the terms mean and why they are statistically significant we can discuss why the weighting occurs the way that it does. Hits are counted because they are a significant factor for both the pitcher and the hitter as discussed above and can influence the game; however, home runs are not. Home runs are not counted because they do not put the ball into play. The acronym BABIP stands for Batting Average on Balls in Play. This is why we subtract home runs from hits in the first portion of the equation. So why do we divide the top portion of the equation by the variables discussed above? Because these variables weight the equation so that events which are significant, but do not put the ball into play are accounted for. Subtracting strike outs from at bats is huge because the smaller number of players which come up to bat against a pitcher signifies their success at minimizing their challenges (related to one on one challenges) when they are playing the game. This portion of the equation is important for hitters because it signifies a level of success when facing a pitcher. Home runs are subtracted out again from at bats and then sacrifice flies are added to the equation to allow for the proper defensive strategizing and offensive failure. One thing that isn’t explicit in this explanation that I thought would be worthwhile to point out is that for batters the BABIP is a fairly direct measure of how they have performed in a game while for pitchers it is more indirect. By this I mean that although batters can control their BABIP for pitchers it is incredibly difficult.

(Above references should all be attributed to baseball prospectus)

Back to the original post? So why is it that the statistics jump around so much from season to season? According to baseball prospectus this is because an average major league baseball pitcher does not maintain a steady BABIP! I did not fully understand this when I originally posted the BABIP blog post. Apparently the BABIP should go up or down according to season and it should be a fairly good predictor of how a pitcher would perform in a season following. For example we see for Clemens in the previous example that he fell from 0.29 to 0.25 in between one season and the next and that the following season he rebounded to approximately 3.3 and then fell to 0.27 within two seasons. This example seems to support the usage of BABIP to predict performance from season to season. Although Clemens went well below the average his performance then exceeded the average and then fell to within it’s confines again.

All sites referenced were accessed on 3/22/2013

http://en.wikipedia.org/sacrificefly

http://www.crawfishboxes.com/2012/11/6/3603134/talking-sabermetrics-what-does-astros-pitcher-babip-tell-us

http://en.wikipedia.org/wiki/Sabermetrics

http://www.baseballprospectus.com/glossary/index.php?search=BABIP

(Mis)uses of Numbers in Argument

Friday, March 22, 2013

Sabermetrics Revisited

No comments:

Post a Comment