Sunday, February 10, 2013

Non-Gaussian Distribution: Marathon Finish Times

The amount of time for someone to finish a marathon (26.2 miles) is not Gaussian. If it were Gaussian we might expect that most people would finish between 4-5 hours, but accurately representing this data is complicated by many other factors, namely gender and age. Men are more likely to have faster times than women, but this varies dramatically throughout each decade. While we might expect most people finish in 4-5 hours, equally as many may finish in 3-4 hours, or 5-6. Very few people finish in 2-3 hours, but a good number may take as long as 7 hours. Also, this data could be easily skewed depending on what type of race you took your sample pool from. The average finish time for Duke City Marathon, which anyone can register for, would likely be much longer than the average for Boston Marathon, which has stringent qualifying times and many more runners. Thus while most data would likely show an average finish time of 4-5 hours (+/- 1 hour), many people are faster, and many slower, and there are numerous other factors affecting this average. I would say that a gamma distribution would be a more accurate representation of the data as it has a mean between two extremes, and is generally used to calculate waiting times.

2 comments:

  1. This is a great example of something that does not follow a Gaussian distribution. You bring up some good points that prevent this from following a normal distribution including gender, age, and restrictions on who can run specific marathons. These confounds make me wonder whether or not these marathon finish times may look Gaussian if they were viewed in their respective categories. For example, I wonder if this distribution may look Gaussian if we looked at just the marathon finish times of a specific age and gender group as compared to viewing all of the marathon times in the same distribution.

    ReplyDelete
  2. Colleen,

    I think you have a good point in noting that the distribution might look Gaussian if we created very specific parameters for looking at times, and I agree. Looking at men in the 25-30 age range would probably show a Gaussian distribution. For example, take the data compiled by Marathon Guide from the 2011 Boston Marathon and it's distribution does indeed look Gaussian and nicely compares both men and women with an average finish time of 3:49:54. What it doesn't say though is that men ages 18-34 have to be able to run a 3hrs 05min 00sec marathon just to qualify in the first place. So before you even start your data is already skewed, and likely reflects a much faster average than you might get "normally." Also, one factor I didn't mention before is weather, which plays a huge role in finishing times. Times compiled from a race run in 50 degree weather are going to be much faster than data say from the 2012 Boston Marathon when it was 85 degrees out, and a majority of the participants didn't finish. Overall what I think this shows is that the phrase "average ______" is in fact vague and doesn't actually tell you much about the data you're working with. You have to ask a lot of other questions to see what you're really getting the average of.

    http://www.marathonguide.com/results/browse.cfm?MIDD=15110418

    ReplyDelete

Note: Only a member of this blog may post a comment.