We covered the Gaussian (normal) distribution today. This is one of the most common distributions in use. We discussed why it is a reasonable distribution for many kinds of data. However, there are plenty of sets of data that are not Gaussian (or normally distributed). Can you describe a random number (either one we have looked at in class, or something else you have found) that you think is NOT Gaussian. Describe the characteristics of the data that lead you to believe that it does not fit the Gaussian distribution. If possible, name another distribution that might be more reasonable (look through some of the early posts on this blog about probability distributions).
Here is one example. The number of years that a student takes to get a Bachelor's degree is not Gaussian. If it were Gaussian, the we would expect that most students would take about 5 years plus or minus one year. A few students may finish in 3 years and an equal number would finish in 7 years. However, we know that very few students finish in 3 years and almost no students finish in 2 years. But on the other end, there are plenty of students that finish in 7, 8, or even 9 years. In other words, the true distribution is not symmetric around the mean. Few students finish faster than average, while many students can take much longer than average. A gamma distribution would better represent this data because it does have a mean between two possible extremes, but the mean (5 or so) is closer to the minimum (2) than it is to the maximum (10 or so).
Another example would be the attenuation of a flood as it propagates downstream. The mean of the flow changes overtime as the flow is restricted by friction forces, such as drag. The data shows similar characteristics of a poisson distribution and not that of a Gaussian distribution.
ReplyDeleteAs professor Oyen discussed in class, sometimes we try to hard to math a gaussian distribution (which is a deeply in-rooted mathematical concept) for other distributions that might not fit as well as they should. For example, for a long time economist assumed a similar Gaussian distribution towards the income of a society. Many researches after this concept was proved wrong ,due to the varying income inequality that may skew the distribution, a lot of mathematical models started accounting for this skewedness. This is the reason why the "median household income" is usually given rather than an "average household income."
ReplyDelete-(2)Xavier Maqueo
Anything restricted by a maximum would not go under a Gaussian distribution. For example, the number of hours that teenagers sleep on a school night: most people would probably sleep between 5 and 10 hours, with very, very few people sleeping 0 or 24 hours (unlikely, but not flat-out impossible.) That distribution would give a decent bell curve. However, nobody could sleep for a negative number of hours, or for 25 hours out of a 24-hour-day, so the asymptotes would be irrelevant to reality. It would probably also be more appropriate for a gamma distribution, because practically speaking, I'd be less surprised about a teenager sleeping 3 hours a night than by a teenager sleeping 20 hours a night, bringing the mean closer to the minimum.
ReplyDeleteI know this question is quite old, but I would like to provide my own answer. One distribution that we have used several times in class is the Bernoulli distribution, marked by heads or tails coin flips. There is a binary probability outcome; success or failure. Instead of a Gaussian distribution that has an infinite number of possible outcomes (most of which are vanishingly small), the Bernoulli has only two.
ReplyDelete