Consumer Research and Big Data

Its been a great week in Boston at the Agricultural and Applied Economics Association (AAEA) annual meeting.  It's always good to see old friends, meet new ones, and learn about a wide array of topics.  

This year, I had the privileged of taking over as president and giving the AAEA presidential address.  I chose to talk about new and emerging data sets that are being used in consumer research.  I presented several short studies using data from the Food Demand Survey (FooDS) to illustrate how we might garner new insights about consumer heterogeneity and demand using new datasets.  A working draft of the paper is here. [Note: I've updated the paper (new draft here) in response to some comments, and some of the elasticity figures have change because I found a small error in my code)   I welcome any comments.   

A few key lessons.  First, there are big differences across consumers in their demands for food at home and away from home, but larger datasets that have a lot of cross-sectional and temporal variability reveals that the "representative consumer" hypothesis is probably false.  Here's a plot showing the distribution of the income elasticitities of demand for food at home and away from how (i.e., how much additional food at home or away from home a household buys as their income increases). For some households, food at home is a "normal" good (they buy more when they make more), but for other households, food is an "inferior" good (they buy less when they make more).  Food away from home is a normal good for more households than is food at home.

One of the main ways economists have studied consumer heterogeneity is by doing surveys.  However, almost all these surveys are conducted at a single point in time.  Thus, they present a "snap shot" of consumer preferences.  Using my survey data, however, I showed (using a so-called choice experiment repeated monthly) that these typical survey approaches might miss a lot of variability over time.   

Finally, one of the problems with many consumer research data sets is that they are not large enough to allow us to learn much about small segments of the population.  If one wants to learn about people with Celiac disease, for example, then a survey of a random sample of 1,000 people will only turn up roughly 20 people with the disease - hardly enough to say anything meaningful.  

In FooDS, we've been asking whether people are vegetarians or vegans for over three years now.  This group only represents about 5% of the population, so one needs a large data set to describe the characteristics of this group.  I used a machine learning method (a classification tree) to predict whether a person self-identified as vegetarian or vegan.  Here's what turned up.  Vegetarians tend to be very liberal, on SNAP (aka "food stamps"), with relatively high incomes, and children under 12 in the house.  

These are just a few examples of the growing number of questions economists can now start to answer as we get our hands on larger, richer datasets.