Big Fat Surprise

I just finished reading Nina Teicholz’s best selling book The Big Fat Surprise, which takes issue with our long-held belief that low-fat diets in general, and diets free of animal fat in particular, best promote good health.  

It’s been an enjoyable read, and the history of the development of our dietary beliefs and guidelines is both fascinating and eye opening.  There is a bit of a tendency in the book for the author to nit pick any study which doesn’t support her hypothesis without applying the same skepticism of those studies which do support it, but overall, she makes a compelling case.  I probably found chapter 10 on "Why Saturated Fat is Good For You" most interesting in that regard.  Teicholz lays bare the sad state of affairs associated with the science behind much of the nutritional advice we’re given.  One takeaway is that we really don’t know as much as is often presumed about what sorts of diets increase/decrease chances or heart attack or cancer.  

There is one nit I want to pick with a phrase in Teicholz’s book.  It is a technical one, but because it is the sort of thing I expect my students to fully understand, I'll delve into it.  On page 167 of the paperback version she writes (about an epidemiological study finding no relationship between breast cancer and consumption of dietary fat), “These conclusions were all associations.  But although epidemiology cannot demonstrate causation, it can be used to reliably show the absence of a connection.” (the emphasis is hers)

That claim is patently false (I'm presuming by "connection" she means "causation").  The trouble with the sort of correlation analysis used in many epidemiology studies is that of missing variables.  We can't observe everything about people's behaviors or about the effects of dietary changes, and that results in "omitted variable bias."  That bias can inflate or reduce the size of a measured effect.   In fact, contrary to Teicholz's claims, omitted variable bias can make a "real" effect look like nothing.

Wikipedia describes the problem, but similar treatments can be found in almost any introductory econometrics textbook.     

Suppose we have the following true relationship:

y=b0 + b1*x + b2*z + e

where y is the chance of breast cancer among women, x is amount of fat consumed, and z is a personality trait reflecting the person's overall health conscientiousness.  The "true" relationship we want to know is given by b1.  

But, suppose we only observe y and x and we don't observe z.  Also suppose that z is related to x in the following way: z = a0 + a1*x + u.  Substituting this equation into the first means that when the epidemiologist runs their analysis they calculate:

y = b0 + b1*x +b2*(a0 + a1*x + u)+ e

or, re-writing:

y = b0 + b2*a0+ (b1+b2*a1)*x + b2*u+e.

So, the researcher looks at the relationship between x and y, and thinks they're estimating the "true" effect b1, but in reality, they're estimating the effect (b1+b2*a1), which could be larger or smaller than b1.  

Suppose b2 takes the positive value of +1.5 (more conscientious women are less likely to develop breast cancer) and a1 is also positive and takes the value of +2 (more conscientious women pay more attention to all that health advice and eat less fat).  This means the effect b2*a1 is positive at the value of +3.  But b1 could be negative (more fat = more breast cancer).  Say, b1=-3.  If the positive effect of b2*a1=+3 outweights the negative effect of b1=-3, so the estimated effect is 3-3=0.  It will look like there is no effect even though there really is one.  Even if the effects don't precisely outweigh each other, the estimated effect could be small enough that it the research concludes it isn't statistically different from zero.

Now, I'm not saying that there is a relationship between fat consumption and breast cancer - rather, I'm just making a conceptual point that omitted variables can result in upward or downward bias.  What I can more confidently say is that only the last part of Teicholz's claim is right:  "epidemiology cannot demonstrate causation."  

Now, there are regression methods that can get us much closer to the truth, but I don't often see these used in epidemiology studies.  In economics, the so-called "credibility revolution" has led to more specification testing and attention to causal-identification using instrumental variables, discontinuity designs, differences-in-differences, and others.  A good introduction to the topics and methods is given in Mostly Harmless Econometrics.