Thursday, March 7, 2013

Sample Weighting


Using the example from class and the Asher book, suppose a poll surveys 1600 people about their views on abortion. Remember that this is a completely fictional scenario. Obviously, we cannot really break the U.S. population into just three religious groups. (See this Pew Forum report for more accurate numbers on religion in the U.S. http://religions.pewforum.org/reports) The number of respondents by religion that Asher gave as an example breaks down as follows:

GROUP       N  
Protestant 1150
Catholic   400
Jewish     50
Total      1600

Further suppose that for whatever question asked, they had the following percentage of people in the survey "agree" with the question:

GROUP      #AGREE  %AGREE              ERROR
Protestant  920   920/1150 = 80%       1.18%
Catholic    260   260/400 = 65%        2.38%
Jewish      30    30/50 = 60%          8.94%
Total       1210  1210/1600 = 75.625%  1.07%

Now, we notice that the standard error for the Catholic and Jewish subgroups is quite high, and so we cannot even tell if 65% and 60% is statistically different from each other. To get better estimates, more people are sampled to increase the number of Catholic and Jewish respondents. For example:

GROUP       N   #AGREE  %AGREE        ERROR
Protestant 1150  920  920/1150 = 80%  1.18%
Catholic   1000  630  630/1000 = 63%  1.53%
Jewish     1000  640  640/1000 = 64%  1.52%

Now, we cannot simply total these numbers as we did in the previous table. If we did that, then the overall sample would not be representative of the overall population of hypothetical "Americans" simplified to include only these three religious groups. The Catholic and Jewish sub-populations were deliberately oversampled to get reasonable sampling error on those subgroups. To calculate the percentage of Americans, we need to weight these samples back to the original proportions from the first survey.

GROUP       PROPORTION OF POPULATION
Protestant  1150/1600 = 0.71875
Catholic    400/1600 = 0.25
Jewish      50/1600 = 0.03125
Weighted average of "agree" percentages = 
0.71875 * 80% + 0.25 * 63% + 0.03125 * 64% = 75.25%

I always like to look at the results intuitively to check that it makes sense. The weighted percent of Americans that "agree" is 75.25% which is quite a bit closer to the percentage of protestants that agree than it is to the other two groups. This makes sense, because the ratio of protestants in the original poll is quite high.

2 comments:

  1. Thank you so much Professor Oyen. Although the process is not incredibly complicated I do feel that if I had attempted to transcribe it on the spot I definitely would have messed it up!

    ReplyDelete
  2. After our discussion in class today and after reading this example, the difference between "oversampling" and "over representation" is MUCH clearer in my book. I thought they were one and the same! Oversampling is no longer the "dirty word" I thought it was. Thanks for the clarification.

    ReplyDelete

Note: Only a member of this blog may post a comment.