I teach a graduate level course every spring semester on survey and experiment methods in economics and the social sciences. In this election season, I thought it might be worthwhile to share a few of the things I discuss in the course so that you might more intelligibly interpret some of survey research results being continuously reported in the newspapers and on the nightly news.
You've been hiding under a rock if you haven't by now seen reports of polls on the likelihood of Trump or Clinton winning the presidential election. Almost all these polls will report (often in small font) something like "the margin of error is plus or minus 3 percent".
What does this mean?
In technical lingo it means the "sampling error" is +/- 3% with 95% confidence. This is the error that comes about from the fact that the polling company doesn't survey every single voter in the U.S. Because not every single voter is sampled, there will be some error, and this is the error you see reported alongside the polls. Let's say the projected percent vote for Trump is 45% with a "margin of error" of 3%. The interpretation would be that if we were to repeatedly sample potential voters, 95% of the time we would expect to find a voting percentage for Trump that is between 42% and 48%.
The thought experiment goes like this: imagine you had a large basket full of a million black and white balls. You want to know the percentage of balls in the basket that are black. How many balls would you have to pull out and inspect before you could be confident of the proportion of balls that are black? We can construct many such baskets where we know the truth about the proportion of black balls and try different experiments to see how accurate we are in many repeated attempts where we, say, pull out 100, 1,000, or 10,000 balls. The good news is that we don't have to manually do these experiments because statisticians have produced precise mathematical formulas that give us the answers we want.
As it turns out, you need to sample about 1,000 to 1,500 people (the answer is 1,067 to be precise) out of the U.S. population to get a sampling error of 3%, and thus most polls use this sample size. Why not a 1% sampling error you might ask? Well, you'd need to survey almost 10,000 respondents to achieve a 1% sample error and the 10x increase in cost is probably not worth a measly two percentage point increase in accuracy.
Here is a key point: the 3% "margin of error" you see reported on the nightly news is only one kind of error. The true error rate is likely something much larger because there are many additional types of error besides just sampling error. However, these other types of errors are more difficult to quantify, and thus, are not reported.
For example, a prominent kind of error is "selection bias" or "non-response error" that comes about because the people who choose to answer the survey or poll may be systematically different than the people who choose not to answer the survey or poll. Alas, response rates to surveys have been falling quite dramatically over time, even for "gold standard" government surveys (see this paper or listen to this podcast). Curiously, those nightly news polls don't tell you the response rate, but my guess is that it is typically far less than 10% - meaning that less than 10% of the people they tried to contact actually told them whether they intend to vote for Trump or Clinton or someone else. That means more than 90% of the people they contacted wouldn't talk to them. Is there something special about the ~10% willing to talk to the pollsters that is different than the ~90% of non-respondents? Probably. Respondents are probably much more interested and passionate about their candidate and politics and general. And yet, we - the consumer of polling information - are rarely told anything about this potential error.
One way pollsters try to partially "correct" for non-response error is through weighting. To give a sense for how this works, consider a simple example. Let's say I surveyed 1,000 Americans and asked whether they prefer vanilla or chocolate ice cream. When I get my data back, I find that there are 650 males and 350 females. Apparently males were more likely to take my survey. Knowing that males might have different ice cream preferences than females, I know that my answer of the most popular ice cream flavor will likely be biased if I don't do something. So, I can create a weight. I know that the true proportion of the US population is roughly 50% male and 50% female (in actuality, there are slightly more females than males but lets put that to the side). So, what I need to do is make the female respondents "count" more in the final answer than the males. When we typically take an average, each person has a weight of one (we add up all the answers - implicitly multiplied by a weight of one - and divide by the total). A simple correction in our ice cream example would be to make a females have a weight of 0.5/0.35=1.43 and males have a weight of 0.5/0.65=0.7. Females will count more than one and males will count less. And, I report a weighted average: add up all the female answers (and multiply by a weight of 1.43) and add to them all the male answers (multiplied by 0.7), and divide by the total.
Problem solved right? Hardly. For one, gender is not a perfect predictor of ice cream preference. And the reason someone chooses to respond to my survey almost certainly has something to do with more than gender. Moreover, weights can only be constructed using variables for which we know the "truth" - or have census bureau data which reveals the characteristics of the whole population. But, in the case of political polling, we aren't trying to match up with the universe of U.S. citizens but the universe of U.S. voters. Determine the characteristics of voters is a major challenge that is in constant flux.
I addition, when we create weights, we could end up with a few people having a disproportionate effect on the final outcome - dramatically increasing the possible error rate. Yesterday, the New York Times ran a fantastic story by Nate Cohn illustrating exactly how this can happen. Here are the first few paragraphs:
Here's a figure they produced showing how this sort of "extreme" weighting affects the polling result reported:
The problem here is that when one individual in the sample counts 30 times more than the typical respondent, the effective sample size is actually something much smaller than actual sample size, and the "margin of error" is something much higher than +/- 3%.
There are many additional types of biases and errors that can influence survey results (e.g., How was the survey question asked? Is there an interviewer bias? Is the sample drawn from a list of all likely voters?). This doesn't make polling useless. But, it does mean that one needs to be a savvy consumer of polling results. It's also why it's often useful to look at aggregations across lots of polls or, my favorite, betting markets.