Using the example from class and the Asher book, suppose a poll surveys 1600 people about their views on abortion. Remember that this is a completely fictional scenario. Obviously, we cannot really break the U.S. population into just three religious groups. (See this Pew Forum report for more accurate numbers on religion in the U.S. http://religions.pewforum.org/reports) The number of respondents by religion that Asher gave as an example breaks down as follows:
GROUP N
Protestant 1150
Catholic 400
Jewish 50
Total 1600
Further suppose that for whatever question asked, they had the following percentage of people in the survey "agree" with the question:
GROUP #AGREE %AGREE ERROR
Protestant 920 920/1150 = 80% 1.18%
Catholic 260 260/400 = 65% 2.38%
Jewish 30 30/50 = 60% 8.94%
Total 1210 1210/1600 = 75.625% 1.07%
Now, we notice that the standard error for the Catholic and Jewish subgroups is quite high, and so we cannot even tell if 65% and 60% is statistically different from each other. To get better estimates, more people are sampled to increase the number of Catholic and Jewish respondents. For example:
GROUP N #AGREE %AGREE ERROR
Protestant 1150 920 920/1150 = 80% 1.18%
Catholic 1000 630 630/1000 = 63% 1.53%
Jewish 1000 640 640/1000 = 64% 1.52%
Now, we cannot simply total these numbers as we did in the previous table. If we did that, then the overall sample would not be representative of the overall population of hypothetical "Americans" simplified to include only these three religious groups. The Catholic and Jewish sub-populations were deliberately oversampled to get reasonable sampling error on those subgroups. To calculate the percentage of Americans, we need to weight these samples back to the original proportions from the first survey.
GROUP PROPORTION OF POPULATION
Protestant 1150/1600 = 0.71875
Catholic 400/1600 = 0.25
Jewish 50/1600 = 0.03125
Weighted average of "agree" percentages =
0.71875 * 80% + 0.25 * 63% + 0.03125 * 64% = 75.25%
I always like to look at the results intuitively to check that it makes sense. The weighted percent of Americans that "agree" is 75.25% which is quite a bit closer to the percentage of protestants that agree than it is to the other two groups. This makes sense, because the ratio of protestants in the original poll is quite high.
Thank you so much Professor Oyen. Although the process is not incredibly complicated I do feel that if I had attempted to transcribe it on the spot I definitely would have messed it up!
ReplyDeleteAfter our discussion in class today and after reading this example, the difference between "oversampling" and "over representation" is MUCH clearer in my book. I thought they were one and the same! Oversampling is no longer the "dirty word" I thought it was. Thanks for the clarification.
ReplyDelete