2008 How to Read a Poll
posted by June 18 at 17:14 PMon
As we approach November, I anticipate a tidal wave of blog posts on polls. Reading the polling data improperly is hazardous to your health. The disconnect between the polling and the 2004 election results nearly resulted in my death. Avoid my mistakes.
1. Remember that polls are always of a population that may or may not resemble who actually goes to the polls. Only pay attention to polls that randomly select respondents. Consider how the poll selects the respondents.
For example, almost all polls used in the presidential race are based off random telephone surveys of landline telephones. I only have a cell phone. Therefore, I am not in the statistical population surveyed.
Thus, even if the poll is perfect, it might not reflect the reality at the polls in the fall, as the populations might not match.
2. A poll only shows a statistically meaningful difference between two candidates if the difference between them is more than twice the margin of error. Most political polls in the United States are designed to have a margin of error of +/- 3%. Therefore, the difference between the candidates must be greater than 6% to be anything other than a tie.
A margin of error of 3% tells us that the true percentage in the population has a 95% chance of being somewhere between three percent above or below the number reported by the survey.
For example, the Rasmussen June 9 2008 poll of Michigan voters has Obama at 45%, McCain at 42%. The actual percentage of the population for Obama ranges from 42% to 48%, McCain 39% to 45%. The ranges overlap, and therefore we cannot say that one is leading over the other, often called a statistical tie.
Another fun thing to consider. 95% confidence means that for one in twenty polls, the true population percentage will not be in this range.
The practical meaning of all this? Beware selectively looking at the poll results! If you are selective enough, you can only see the error you want to see. Net result? Suicidal thoughts in November.
3. Often the real trends are smaller than the error ranges of the surveys. We can employ two math tricks to make things better.
First, we can aggregate many surveys together and get an average of percentages. We have to be careful when estimating the confidence interval after this averaging, but we can get a better guess at the true population’s percentage just by looking at more than one survey at a time.
The second trick is to use moving averages as a mathematically safe way to sort out random ups-and-downs in the poll numbers from the real longer term changes in the sampled population.
Think of how much your weight changes each day, by when you’ve last gone to the bathroom, how much water you’ve drank and so on. The change on a day-by-day basis is far larger than what you’ll typically gain or lose in a week. So, if you measure your weight each day, and then average together the last seven days, you end up smoothing out all the variance. Left behind is the actual change on a week-long basis. We can use the same math on the polls.
Quite a few websites are around that basically do all of this for us, limiting themselves to polls with some statistical rigor, base their analysis on the confidence intervals, and aggregate multiple polls together in a moving average. None are perfect, but I’ve taken a shine to electoral-vote.com for it’s non-commercial goodness and openness. I think the site is too aggressive in calling states—Michigan is listed as barely Obama, I think it should be a toss-up—but overall it’s a decent place to start.
Several readers have pointed me to www.fivethirtyeight.com as the source for aggregated poll data on the 2008 election.
This is great Science. Have you checked out fivethirtyeight.com yet? Election data porn for DAYS…complete with charts! Would love to hear your thoughts on its methodologies Posted by sherman | June 18, 2008 5:22 PM
no, dude, you need more than a cursory glance at fivethirtyeight.com. 538 uses detailed methodology to apply trends across congressional districts with similar demographics. The guy who runs it is the same genius who created PECTOA for the Baseball Prospectus.
Trust me, electoral-vote.com is so 2004. This years it’s all about Nate at 538.
Posted by el ganador | June 18, 2008 6:43 PM
538 is the real deal.
Nate Silver is the brains behind it and the guy is a math wizard - he’s a legend in baseball circles for being by far the most accurate at ball player projections (PECOTA)
You need to read a his FAQ - much of the stuff is over the head of anyone without a PHD in statistics but it is grounded in hard science. He was recently hired by Rasmussen and appeared on CNN - FWIW the ‘main stream’ is taking him very seriously.
Posted by DavidC | June 18, 2008 6:45 PM