Election 2012 – Pooling Polls Works
This is my site Written by Jeff on November 10, 2012 – 11:58 pm

Poll aggregators Simon Jackman, Drew Linzer, Nate Silver, and Sam Wang all made extremely accurate predictions for both the 2012 Electoral and popular vote outcomes. Why were they so good?  The most important reason is that they all developed statistical models based on state polls.  They also avoided subjectivity by using consistent statistical criteria for handling potential “house biases” of polls (e.g., weighting polls or using median statistics).

This is where the similarity in their methods ends. Nate Silver’s model may be the most complex and theoretical, integrating economic data, state polls, and national polls into a predictive model.  Sam Wang’s model, on the other hand, relies only on meta-analysis of recent state polls.  But, even though each model is very different, they all converged on similar predictions because they relied on state polls.

Was the complexity of these models necessary? The answer appears to be “No.”  Just pooling polls within states is as effective as more complex models. Imagine you are a super pollster who has different polling groups providing you with polling data on percentages for candidates, sample sizes, and the span of time the data were collected.  You pool these polls over some larger window of time.  By pooling the polls, the super-state polls have much larger sample sizes and potential biases of individual polling groups may tend to cancel out. I generated super-state polls, which yielded 100% correct predictions for all the states and the District of Columbia.  These super-state polls were generated from state polls taken from approximately the beginning of September until November 5th.   In addition to predicting the electoral vote exactly (332 vs. 206), the differences predicted between Obama and Romney for each state were highly correlated (r = 98%) as illustrated in Figure 1.

Figure 1. Plot of the predicted differences between Obama and Romney for each polled state (x-axis) and the outcome as of 11-9-12.

This raises an interesting problem.  Since pooled polls were as predictive as any other statistical approach to aggregating polls and produced a near perfect correlation with the actual voting results (Figure 1), how could we determine whether a new statistical approach for aggregating polls is better than pooling?  It doesn’t appear possible since there is very little room for meaningful improvement (see Figure 1).  In future elections, however, events may occur that significantly alter voter preferences and these events might be best detected by Sam Wang’s meta-analytic approach or by pooling polls with smaller windows (i.e., creating super tracking polls with windows of a month, two weeks, or even a week) for highly polled swing states.

Posted in  

Leave a Reply