Tuesday, March 19, 2013

Sabermetrics



While reading the article on Nate Silver in Scientific American this week I noted that while working for Baseball Prospectus he utilized a system of statistics called Sabermetrics.  After doing a bit of research on Wikipedia I found out that Sabermetricians utilize baseball statistics in order to devise new and original ways to analyze statistics which are different from the traditional measures of stats.  An example of this type of sabermetric equation would be batting average on balls in play or BABIP.  The formula for BABIP=H-HR/AB-K-HR+SF.  With H=hits, HR=home runs, AB=at bats, K= strike outs, and SF= sacrifice flies.  For a pitcher this formula makes quite a bit of sense as a traditional measure of pitching quality would be earned run average which fails to account for many of the qualitative factors which are mentioned above.  One of the unique uses of BABIP is not only comparing pitchers across leagues but also being able to compare various pitchers across time!  Although it is true that we can do this with ERA as well there are many factors which influence ERA which do not allow for the weightedness which is considered with BABIP.  To illustrate this example I have utilized an analysis system which presents graphical representations of two different pitchers against the average for all pitchers.  For this analysis I chose Sandy Koufax and Roger Clemens.  Koufax and Clemens were both known for having amazing beginnings to their careers so the question that I wanted to know was, “whose beginning was better?”
Luckily for me baseball is an oft enough discussed topic that these two men are both available with a full list of BABIP statistics available.  The graphical representation is shown below.  Clemens clearly had a better start to his career although as BABIP would predict the fluctuations from year to year are also extreme.  Utilizing this system of weighted averages we could say (asterisks notwithstanding) that Clemens was a better pitcher than Koufax.


2 comments:

  1. AJ-

    Interesting post. As a means of clarification, is the green line a BABIP or ERA average for all pitchers? From your data, I would also argue that Koufax didn't have that excellent of a start, he is only slightly above average for one data point.
    However, I'd argue that neither is significantly better than average (statistically speaking) if you look at the variations throughout their careers. In fact, Koufax might even be slightly worse than average.
    I'd try to prove this myself by comparing the Koufax and Clemens averages to the overall average, but I cannot read your axes on my computer. :(

    ReplyDelete
  2. It's cool Kathleen. BABIP is an aggregate weighted score for quite a bit of different statistics. The green line is a BABIP average for all pitchers. Apparently how BABIP works is that the tumultuousness of the representation is how it should represent. The skill of both Koufax and Clemens is assumed anecdotally (Koufax has always been assumed more historically) and if you look at ERA Koufax would have in his inaugural year been competitive with Clemens. The graphical representation of this that I was able to find didn't allow for ERA to be represented temporally as well. The average is at .3 and that does not speak particularly well for Koufax however Clemens' BABIP is well above three almost four.

    ReplyDelete

Note: Only a member of this blog may post a comment.