Blog

Polling 101

I teach a graduate level course every spring semester on survey and experiment methods in economics and the social sciences.  In this election season, I thought it might be worthwhile to share a few of the things I discuss in the course so that you might more intelligibly interpret some of survey research results being continuously reported in the newspapers and on the nightly news. 

You've been hiding under a rock if you haven't by now seen reports of polls on the likelihood of Trump or Clinton winning the presidential election.  Almost all these polls will report (often in small font) something like "the margin of error is plus or minus 3 percent".  

What does this mean?

In technical lingo it means the "sampling error" is +/- 3% with 95% confidence.  This is the error that comes about from the fact that the polling company doesn't survey every single voter in the U.S.  Because not every single voter is sampled, there will be some error, and this is the error you see reported alongside the polls.  Let's say the projected percent vote for Trump is 45% with a "margin of error" of 3%.  The interpretation would be that if we were to repeatedly sample potential voters, 95% of the time we would expect to find a voting percentage for Trump that is between 42% and 48%.

The thought experiment goes like this: imagine you had a large basket full of a million black and white balls.  You want to know the percentage of balls in the basket that are black.  How many balls would you have to pull out and inspect before you could be confident of the proportion of balls that are black?  We can construct many such baskets where we know the truth about the proportion of black balls and try different experiments to see how accurate we are in many repeated attempts where we, say, pull out 100, 1,000, or 10,000 balls.  The good news is that we don't have to manually do these experiments because statisticians have produced precise mathematical formulas that give us the answers we want.  

As it turns out, you need to sample about 1,000 to 1,500 people (the answer is 1,067 to be precise) out of the U.S. population to get a sampling error of 3%, and thus most polls use this sample size.  Why not a 1% sampling error you might ask?  Well, you'd need to survey almost 10,000 respondents to achieve a 1% sample error and the 10x increase in cost is probably not worth a measly two percentage point increase in accuracy. 

Here is a key point: the 3% "margin of error" you see reported on the nightly news is only one kind of error.  The true error rate is likely something much larger because there are many additional types of error besides just sampling error. However, these other types of errors are more difficult to quantify, and thus, are not reported.

For example, a prominent kind of error is "selection bias" or "non-response error" that comes about because the people who choose to answer the survey or poll may be systematically different than the people who choose not to answer the survey or poll.  Alas, response rates to surveys have been falling quite dramatically over time, even for "gold standard" government surveys (see this paper or listen to this podcast).  Curiously, those nightly news polls don't tell you the response rate, but my guess is that it is typically far less than 10% - meaning that less than 10% of the people they tried to contact actually told them whether they intend to vote for Trump or Clinton or someone else.  That means more than 90% of the people they contacted wouldn't talk to them.  Is there something special about the ~10% willing to talk to the pollsters that is different than the ~90% of non-respondents?  Probably.  Respondents are probably much more interested and passionate about their candidate and politics and general.  And yet, we - the consumer of polling information - are rarely told anything about this potential error.

One way pollsters try to partially "correct" for non-response error is through weighting.  To give a sense for how this works, consider a simple example.  Let's say I surveyed 1,000 Americans and asked whether they prefer vanilla or chocolate ice cream.  When I get my data back, I find that there are 650 males and 350 females.  Apparently males were more likely to take my survey.  Knowing that males might have different ice cream preferences than females, I know that my answer of the most popular ice cream flavor will likely be biased if I don't do something.  So, I can create a weight.  I know that the true proportion of the US population is roughly 50% male and 50% female (in actuality, there are slightly more females than males but lets put that to the side).  So, what I need to do is make the female respondents "count" more in the final answer than the males.  When we typically take an average, each person has a weight of one (we add up all the answers - implicitly multiplied by a weight of one - and divide by the total).  A simple correction in our ice cream example would be to make a females have a weight of 0.5/0.35=1.43 and males have a weight of 0.5/0.65=0.7.  Females will count more than one and males will count less.  And, I report a weighted average: add up all the female answers (and multiply by a weight of 1.43) and add to them all the male answers (multiplied by 0.7), and divide by the total.  

Problem solved right?  Hardly.  For one, gender is not a perfect predictor of ice cream preference.  And the reason someone chooses to respond to my survey almost certainly has something to do with more than gender.  Moreover, weights can only be constructed using variables for which we know the "truth" - or have census bureau data which reveals the characteristics of the whole population.  But, in the case of political polling, we aren't trying to match up with the universe of U.S. citizens but the universe of U.S. voters.  Determine the characteristics of voters is a major challenge that is in constant flux.  

I addition, when we create weights, we could end up with a few people having a disproportionate effect on the final outcome - dramatically increasing the possible error rate. Yesterday, the New York Times ran a fantastic story by Nate Cohn illustrating exactly how this can happen.  Here are the first few paragraphs:

There is a 19-year-old black man in Illinois who has no idea of the role he is playing in this election.

He is sure he is going to vote for Donald J. Trump.

And he has been held up as proof by conservatives — including outlets like Breitbart News and The New York Post — that Mr. Trump is excelling among black voters. He has even played a modest role in shifting entire polling aggregates, like the Real Clear Politics average, toward Mr. Trump.

How? He’s a panelist on the U.S.C. Dornsife/Los Angeles Times Daybreak poll, which has emerged as the biggest polling outlier of the presidential campaign. Despite falling behind by double digits in some national surveys, Mr. Trump has generally led in the U.S.C./LAT poll. He held the lead for a full month until Wednesday, when Hillary Clinton took a nominal lead.

Our Trump-supporting friend in Illinois is a surprisingly big part of the reason. In some polls, he’s weighted as much as 30 times more than the average respondent, and as much as 300 times more than the least-weighted respondent.

Here's a figure they produced showing how this sort of "extreme" weighting affects the polling result reported:

The problem here is that when one individual in the sample counts 30 times more than the typical respondent, the effective sample size is actually something much smaller than actual sample size, and the "margin of error" is something much higher than +/- 3%.

There are many additional types of biases and errors that can influence survey results (e.g., How was the survey question asked? Is there an interviewer bias? Is the sample drawn from a list of all likely voters?).   This doesn't make polling useless.  But, it does mean that one needs to be a savvy consumer of polling results.  It's also why it's often useful to look at aggregations across lots of polls or, my favorite, betting markets.

Value of Nutritional Information

There is a general sense that nutritional information on food products is "good" and "valuable."  But, just how valuable is it?  Are the benefits greater than the costs?

There have been a large number of studies that have attempted to address this question and all have significant shortcomings.  Some studies just ask people survey questions about whether they use or look at labels.  Other studies have tried to look at how the addition of labels changes purchase behavior - but the focus here is typically limited to only a handful of products. As noted in an important early paper on this topic, by Mario Teisl, Nancy Bockstael, and Alan Levy, nutritional labels don't have to cause people to choose healthier foods to be valuable.  Here is one example they give:

consider the individual who suffers from hypertension, has reduced his sodium intake according to medical advice, and believes his current sodium intake is satisfactory. If this individual were to learn that certain brands of popcorn were low in salt, then he may switch to these brands and allow himself more of some other high sodium food that he enjoys. Better nutritional information will cause changes in demand for products and increases in welfare even though it may not always cause a backwards shift in all risk increasing foods nor even a positive change in health status.

This is why it is important to consider a large number of foods and food choices when trying to figure out the value of nutritional labels.  And that's exactly what we did in a new paper just published in the journal Food Policy.  One of my Ph.D. students, Jisung Jo, used some data from an experiment conducted by Laurent Muller and Bernard Ruffieux in France to estimate consumers' demands for 173 different food items in an environment where shoppers made an entire day's worth of food choices.  This lets us calculate the value of nutritional information per day (not just per product).  

The nutritional information we studied relies on two simple nutritional indices created by French researchers.  They are something akin to a NuVal label system or a traffic light system.  We first asked people where they thought each of the 173 foods fell on the nutritional indices (and we also asked how tasty or untasty each of the foods were), and then after making a day's worth of (non-hypothetical) food choices, we told them were each food actually fell.   Here's a bit more detail.  

The initial “day 1” food choices were based on the individuals’ subjective (and implicit) health beliefs. Between days 1 and 2, we sought to measure those subjective health beliefs and also to provide objective information about each of the 173 foods. The beliefs were measured by asking respondents to pick the quadrant in the SAIN (Nutrient Adequacy Score for Individual foods) and LIM (for Limited Nutrient) table (Fig. 2) that best described where they thought each food fit. The SAIN and LIM are nutrient profiling models and indices introduced by the French Food Safety Agency. The SAIN score is a measure of “good” nutrients calculated as an un-weighted arithmetic mean of the percentage adequacy for five positive nutrients: protein, fiber, ascorbic acid, calcium, and iron. The LIM score is a measure of “bad” nutrients calculated as the mean percentage of the maximum recommended values for three nutrients: sodium, added sugar, and saturated fatty acid.2 Since indices help reduce search costs, displaying the information in the form of an index is a way to make the information available in an objective way but also allows consumers to better compare the many alternative products in their choice set.

Here are the key results:

In this study, we found that nutrient information conveyed through simple indices influences consumers’ grocery choices. Nutrient information increases willingness-to-pay (WTP) for healthy food and decreases WTP for unhealthy food. The added certainty provided by objective nutrient information increased the marginal WTP for healthy food. Moreover, there is a sort of loss aversion at play in that WTP for healthy vs. neutral food is lower than WTP for neutral vs. unhealthy food, and this loss aversion increases with information. . . . This study estimated the value of the nutrient index information at €0.98/family/day. The advantage of our approach is that the value of information reflects choices over a larger number of possible foods and represents an aggregate value over the whole day.

I should also note that people valued the taste of their food as well.  We found consumers were willing to pay 4.33 eruos/kg more for a one-unit increase in on the -5 to +5 taste scale.  To put this number in perspective, let's take a closer look at the average taste rating given to all 173 food items. Most items had a mean rating above zero. The highest rated items on average were items like tomatoes (+4.1), green salad (+4), and zucchini (+3.9). The lowest rated items on average included cheese spread ( 0.2) and Orangina light ( 1.9). [remember: these were French consumers] Moving from one of the lower to higher rated items would induce a four-point change in the taste scale associated with a change in economic value of 4.33 ⁄ 4 = 17.32 euros/kg.”

Real World Demand Curves

On a recent flight, I listened to the latest Freakonomics podcast in which Stephen Dubner interviewed the University of Chicago economist Steven Levitt about some of his latest research.  The podcast is mainly about how Levitt creatively estimated demand for Uber and then used the demand estimates to calculate the benefits we consumers derive from the new ride sharing service.  

Levitt made some pretty strong statements at the beginning of the podcast that I just couldn't let slide.  He said the following:

And I looked around, and I realized that nobody ever had really actually estimated a demand curve. Obviously, we know what they are. We know how to put them on a board, but I literally could not find a good example where we could put it in a box in our textbook to say, “This is what a demand curve really looks like in the real world,” because someone went out and found it.

As someone whose spent the better part of his professional career estimating consumer demand curves, I was a bit surprised to hear Levitt claim "nobody ever had really estimated a demand curve."  He also said, "we completely and totally understand what a demand curve is, but we’ve never seen one."  The implication seems to be that Levitt is the first economist to produce a real world estimate of a demand curve.  That's sheer baloney.  

The most recent Nobel prize winner in economics, Angus Deaton, is perhaps most well known for his work on estimating consumer demand curves.

In fact, agricultural economists were among the first people to estimate real world demand curves (see this historical account I coauthored a few years ago).  Here is a screenshot of a figure out of a paper by Schultz in the Journal of Farm Economics in 1924 who estimated demand for beef.  Yes - in 1924!  I'm pretty sure that figure was hand drawn!

Or, here's Working in a paper in the Quarterly Journal of Economics in 1925 estimating demand for potatoes.

Two years later in 1927, Working's brother was perhaps the first to discuss "endogeneity" in demand (how do we know we're observing a demand curve and not a supply curve?), an insight that had a big influence on future empirical work.

Fast forward to today and there are literally thousands of studies that have estimated consumer demand curves.  The USDA ERS even has a database which, in their words,  "contains a collection of demand elasticities-expenditure, income, own price, and cross price-for a range of commodities and food products for over 100 countries."   

Here is a figure from one of my papers, where the demand curve is cleanly identified because we experimentally varied prices.  

And, of course, I've been doing a survey every month for over three years where we estimate demand curves for various food items.

In summary, I haven't the slightest idea what Levitt is talking about.  

Consumer Research and Big Data

Its been a great week in Boston at the Agricultural and Applied Economics Association (AAEA) annual meeting.  It's always good to see old friends, meet new ones, and learn about a wide array of topics.  

This year, I had the privileged of taking over as president and giving the AAEA presidential address.  I chose to talk about new and emerging data sets that are being used in consumer research.  I presented several short studies using data from the Food Demand Survey (FooDS) to illustrate how we might garner new insights about consumer heterogeneity and demand using new datasets.  A working draft of the paper is here. [Note: I've updated the paper (new draft here) in response to some comments, and some of the elasticity figures have change because I found a small error in my code)   I welcome any comments.   

A few key lessons.  First, there are big differences across consumers in their demands for food at home and away from home, but larger datasets that have a lot of cross-sectional and temporal variability reveals that the "representative consumer" hypothesis is probably false.  Here's a plot showing the distribution of the income elasticitities of demand for food at home and away from how (i.e., how much additional food at home or away from home a household buys as their income increases). For some households, food at home is a "normal" good (they buy more when they make more), but for other households, food is an "inferior" good (they buy less when they make more).  Food away from home is a normal good for more households than is food at home.

One of the main ways economists have studied consumer heterogeneity is by doing surveys.  However, almost all these surveys are conducted at a single point in time.  Thus, they present a "snap shot" of consumer preferences.  Using my survey data, however, I showed (using a so-called choice experiment repeated monthly) that these typical survey approaches might miss a lot of variability over time.   

Finally, one of the problems with many consumer research data sets is that they are not large enough to allow us to learn much about small segments of the population.  If one wants to learn about people with Celiac disease, for example, then a survey of a random sample of 1,000 people will only turn up roughly 20 people with the disease - hardly enough to say anything meaningful.  

In FooDS, we've been asking whether people are vegetarians or vegans for over three years now.  This group only represents about 5% of the population, so one needs a large data set to describe the characteristics of this group.  I used a machine learning method (a classification tree) to predict whether a person self-identified as vegetarian or vegan.  Here's what turned up.  Vegetarians tend to be very liberal, on SNAP (aka "food stamps"), with relatively high incomes, and children under 12 in the house.  

These are just a few examples of the growing number of questions economists can now start to answer as we get our hands on larger, richer datasets.  

Mandatory GMO Labeling Closer to Reality

I've written a lot about mandatory labeling of genetically engineered foods over the past couple years, and given current events, I thought I'd share a few thoughts about ongoing developments.  Given that the Senate has now passed a mandatory labeling law, and discussion has moved to the House, it appears the stars may be aligning such that a nationwide mandatory GMO labeling will become a reality.  

The national law would preempt state efforts to enact their own labeling laws, and it would require mandatory labeling of some genetically engineered foods (there are many exemptions and it is unclear whether the mandatory labels would be required on only foods that contain genetic material or also those - such as oil and sugar - which do not).  Food manufacturers and retailers can comply with the law in a variety of ways including on-package labeling and via QR codes.  Smaller manufacturers can comply by providing a web link or phone number for further information.  

Many groups that have, in the past, advocated for mandatory labeling are against the bill because, they say, it doesn't go far enough (e.g., this group is upset because it doesn't "drive Frankenfoods . . . off the market."). Other anti-mandatory labeling folks also don't like the bill because of philosophical opposition to signalling out a technology that poses no added safety risks.  

I suppose this is how democracy works.  Compromise.  Neither side got everything they wanted, but at least from my perspective, this is a law that provides some form of labeling, which will hopefully shelve this issue and allow us to move on to more important things in a way that is likely to have the least detrimental economic effects.   

I'm sympathetic to the arguments made by folks who continue to oppose mandatory labeling on the premise that our laws shouldn't be stigmatizing biotechnology.  Because a GMO isn't a single "thing" I agree the law is unhelpful insofar as giving consumers useful information about safety or environmental impact.  The law is also a bit hypocritical in terms of exempting some types of GMOs and not others.  One might also rightfully worry about when the government should have the power to compel speech and when it shouldn't.  And, I think we should be worried about laws which potentially hinder innovation in the food sector.  

But, here's the deal.  The Vermont law was soon going into effect anyway. The question wasn't whether a mandatory labeling law was going into effect but rather what kind.   The Vermont law was already starting have some impact in that state and would likely have had nationwide impacts.  Moreover, there didn't seem to be a practical legal or legislative way to prevent the law from going into effect in the foreseeable future.  

The worst economic consequences of mandatory labeling would have come about from those types of labels that were most likely to be perceived by consumers as a "skull and cross bones".   In my mind the current Senate bill avoided this worst case scenario while giving those consumers who really want to know about GMO content a means for making that determination.  That doesn't mean some anti-GMO groups won't use the labels as a way of singling out for protest companies that use foods and ingredients made with the technology, but at least the motives are more transparent in this case.  For some groups it was never about labeling anyway - it was about opposition to the technology.  That, in my opinion, is a much less tenable position, and is one that will hopefully be less successful in the long run.