The Problem with Observational Studies (Epidemiology)

By Jim Borgman for The Cincinnati Enquirer

Thousands of years ago, we built pyramids and aqueducts. Hundreds of years ago, we invented the printing press and telescope. In the 20th century we discovered penicillin and the structure of DNA.

Yet here we are in 2021 and we still can’t figure out what to eat.

Nutrition has become as controversial a topic as any in science. What’s healthy and what’s not depends on who (and when) you ask. Many foods that are considered harmful by some groups, like soy and grains, or meat and eggs, are considered the cornerstone of a healthy diet by others. Similarly, dietary patterns that were once promoted by health organizations, like low-fat high-carbohydrate snacks, are now vilified, leaving many people with more questions than answers.

Does saturated fat clog our arteries?
Should we embrace low-fat or high-fat diets?
Is low-carb the answer to our nutrition quandaries, or is eating more whole grains?
What’s better, a plant-based or a paleo diet?
Are soy-based burgers healthier than beef burgers?
Is fruit juice healthy, or is it a sugar bomb?
Are eggs good for us or are they too high in cholesterol?

Following the covers of TIME Magazine alone tells the story of our schizophrenic relationship with food.

In 1984, a plate of eggs and bacon made the cover of TIME Magazine, which featured an article warning readers about the perils of dietary cholesterol. And yet in 1999, TIME released a very different cover, suggesting that dietary cholesterol and eggs are in fact okay.

Time Magazine did a similar flip-flop regarding fat. A 1961 issue of TIME discusses the dangers of a high fat diet; in 2014, another cover features the caption, “Scientists labeled fat the enemy. Why they were wrong.”

TIME’s covers are reflections of the prevailing dietary tailwinds of the time, influenced by ever-changing “conclusive results” from nutritional studies. One day, butter causes heart disease, the next day it prevents diabetes.

The flip-flopping of mainstream dietary advice is largely explained by an over-reliance on what are called observational studies. Also known as cohort studies or nutritional epidemiology, these types of studies show correlation, but not causation, creating endless interesting hypotheses and few definitive answers.

The results of observational studies form the foundation of today’s nutrition policy, driving most of the dietary advice from the American Heart Association, World Health Organization, and US Dietary Guidelines. They also lead to attention-grabbing headlines that prove difficult to erase from our collective conscious.

To better understand why observational studies are so inconsistent and unreliable, yet still prevalent, we must first understand what exactly an observational study is.

What is an observational study?

There are two primary ways of undertaking studies to find out what happens when we eat one thing versus another: observational studies and randomized trials.

Observational studies are the easier of the two options as they only require handing out questionnaires to people about their diet and lifestyle habits, and then following those people for a number of years to find out which habits are associated with different health outcomes. Observational studies lead to conclusions like:

"Brushing Teeth May Keep Away Heart Disease:
Study Shows People Who Brush Teeth Less Frequently Are at Higher Risk for Heart Disease"

The problem is, we don’t know whether teeth brushing is actually the thing preventing heart disease, or if people who have good oral hygiene happen to have healthier lifestyle habits in general, and therefore have less disease. The results of observational studies show correlation, not causation; they make catchy headlines, but are not causative evidence.

In randomized trials, two groups of randomly selected people are each assigned a different intervention. For example, one group may eat a diet with soybean oil as the primary source of fat, while the other group is assigned a diet with olive oil as the primary source of fat. Everything else about their diets and lifestyles remain unchanged.

Typically these randomized trials are also “blinded” in that neither group knows which intervention they’re receiving. For example, participants aren’t aware of whether their scrambled eggs are cooked in soybean oil or olive oil.

Because of the numerous problems with observational studies, which we’ll discuss below, only the results of well designed randomized trials should be considered when creating dietary guidelines; however, flawed and biased observational studies have become the centerpiece of modern nutrition policy.

The problems with observational studies

In short, the primary issue with drawing conclusions from observational studies is that they’re often wrong. In fact, they may be wrong more often than they’re right.

In a 2017 analysis of 156 biomedical association studies reported by newspapers, only 48% of the studies were in any way validated by follow up studies. To put it another way, according to Vox, “half of the studies you read about in the news are wrong.”

A 2011 paper in the Journal of the Royal Statistical Society puts it bluntly, “Any claim coming from an observational study is most likely to be wrong.”

While many studies are wrong due to poor design, even well designed observational studies are inherently unreliable, for a number of reasons, the most important of which is bias.

A pivotal 2005 essay by professor John Ioannidis at the Stanford School of Medicine states, “for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias,” and continues, “even well-powered epidemiological studies may have only a one in five chance being true.”

So the chance of any given observational study conclusion being true is somewhere between a dice roll and a coin flip.

Let’s talk about the reasons why:

Healthy user bias

The teeth brushing study mentioned earlier, which implies that frequent teeth brushing prevents heart disease, suffers from what is called healthy user bias.

Everyone knows that regular teeth brushing is the cornerstone of good oral hygiene, meaning that people who care about their health tend to brush their teeth more often. Therefore, frequent teeth brushers are also more likely to get regular checkups at the doctor, exercise more frequently, eat healthier diets, have more close relationships with friends who aren’t scared off by bad breath, etc.

The group of people who frequently brush their teeth are inherently a healthier group of people. While it’s possible that less frequent teeth brushing is indeed a cause of heart disease, it’s also possible that the opposite is true. There’s no way to know for sure without a randomized trial, because teeth brushing and heart disease are linked by association or correlation, not by causation.

As would be expected, there are different healthy user biases in different cultures. If a different country’s government instructed citizens to avoid teeth brushing for better health, then it is likely that the people who don’t brush their teeth would actually have better health outcomes.

We see this rather unsurprising phenomenon in the study of cholesterol, eggs, and heart disease. Some observational studies of cholesterol-wary Americans show a correlation between egg consumption and heart disease risk, leading to headlines such as, “Each additional half-egg consumed per day is associated with a significant increase in heart disease risk.” However, the inverse association exists in China.

According to the American Heart Association, “in China, egg consumption is positively associated with socioeconomic indicators and physical activity, inversely associated with smoking, and generally correlated with other aspects of a healthy dietary pattern (ie, higher intake of fiber, vegetables, and fruit).”

In other words, healthy user bias comes into play in China, where healthier people who exercise more and eat more fruits and vegetables tend to also eat more eggs. In the US, on the other hand, eggs may fall victim to what is called unhealthy user bias.

Unhealthy user bias

Just as there is healthy user bias, there is also unhealthy user bias. For decades, public health advice has told us to limit cholesterol and opt for low cholesterol egg whites instead of high cholesterol egg yolks. The rebels that ignore such health advice and opt for high cholesterol foods tend to also be the people that ignore other health advice to exercise more, eat fruits and vegetables, stop smoking, and drink less alcohol.

In other words, the people who ignore public health advice, including limiting egg intake, are inherently an unhealthier group of people, and there’s no way to prove that the eggs themselves are the reason for their worse health.

As another example of unhealthy user bias, consider that the more ashtrays someone owns, the more likely they are to die of lung cancer. Obviously, the existence of ashtrays in someone’s house doesn’t cause cancer in and of itself, but ashtrays are associated with cigarette smoking, which most likely does cause cancer.

People who own a lot of ashtrays are inherently an unhealthier group of people. Just as everyone knows that teeth brushing is a healthy habit, everyone also knows that cigarette smoking, which typically requires ashtrays, is an unhealthy habit.

People who own ashtrays not only smoke more, they also ignore other health advice and are more likely to have poor diets, exercise less frequently, and may even brush their teeth less often. Ashtrays do not cause cancer, but they are correlated with unhealthy habits, like cigarette smoking, that do.

Confounding variables

In the case of teeth brushing and ashtray ownership, the associated lifestyle habits–diet, exercise frequency, number of cigarettes smoked per day, etc–are called confounding variables, because they confound (mix up or confuse) cause and effect.

In order to work around pesky confounding variables, epidemiologists “control” for these variables with statistical tools. For example, if infrequent teeth brushers also smoke more, researchers may control for smoking by, in the simplest of examples, only comparing smokers with smokers and non-smokers with non-smokers.

Once doing so, the confounding variable is considered “controlled,” or “adjusted,” like the taming of a wild and unruly beast, and the press treat the adjustment as justification for believing the study:

"After adjusting the data for cardiovascular risk factors such as obesity, smoking, social class, and family heart disease history, the researchers found that people who admitted to brushing their teeth less frequently had a 70% extra risk of heart disease."

The problem is, while statisticians and epidemiologists do the best they can, there’s no way to truly control for every possible variable in someone’s life. Maybe people who don’t brush their teeth twice per day also stay up later and sleep less, exercise less, do more drugs, have fewer social relationships, drink more alcohol, or have any other number of dietary or lifestyle habits that could confound the results.

In addition to simply missing certain confounding variables, there’s also the issue of how confounders are controlled. Controlling for exercise, for example, requires much more nuance than “exercisers versus non-exercisers.” What counts as exercise? Does competitive thumb-wrestling count? What about walking? And how do joggers compare to sprinters, let alone triathletes? Controlling for every single aspect of a confounding variable like exercise, let alone all of the possible confounding variables in someone’s life or in the lives of tens of thousands of study participants, quickly becomes a fool’s errand.

Food Frequency Questionnaires

If observational studies are all about observing what someone eats, and correlating that to disease, you may be wondering how researchers objectively and quantifiably determine what exactly goes into someone’s mouth over the course of many years or decades. Turns out, they don’t.

Using food frequency questionnaires, researchers rely on self-reporting and the subjective memories of study participants to estimate what they ate over the course of the previous year. Here’s an example of a modern food frequency questionnaire:

If you’re an avid kiwi eater like I am or a lover of papayas and persimmons, you’re out of luck. Those fruits, and dozens of others, are not included in this food frequency questionnaire (source).

How many 1/2-cup servings of strawberries did you eat per week last year?

If you’re not sure, don’t worry, you’re not alone. It’s nearly impossible for most people to remember what they had for dinner last Wednesday, let alone how many servings of a particular seasonal fruit they had in the previous year, averaged out on a weekly basis, measured in units of “slices” or a “small glass”.

Our faulty memories are also subject to bias, leading us to underestimate some foods and overestimate others. For example, in a 2008 study on food frequency questionnaires, participants were asked to report their intake of fruits and vegetables. Before completing the questionnaire, half of the participants received a letter describing the benefits of fruits and vegetables. Those who received the letter reported 40% higher intakes of fruits and vegetables, not because they actually ate more than participants who didn’t receive the educational letter, but simply because they were exposed to and influenced by the prevailing bias that vegetables and fruits are healthy.

That study points to what is happening on a national scale, even in “large, well controlled” observational studies that rely on self-reported data. People who are educated on healthy habits, including what to eat, tend to overestimate their consumption of foods generally thought to be healthy and underestimate their intake of foods thought to be unhealthy.

As Harvard-trained doctor Georgia Ede, MD points out in her excellent piece on the problem with epidemiological studies, “[I]nstead of neutral, objective, quantifiable measurements, we have forced, subjective, inaccurate estimates. These wild guesses become the ‘data’ that form the backbone of the entire study. No matter how sophisticated the statistics and analysis you apply to your research may be, your results will only be as good as the ‘data’ you are analyzing. As the saying goes: garbage in, garbage out.”

A 2015 paper in the International Journal of Obesity writes that self-reported data on diet are so poor that “they no longer have a justifiable place in scientific research.”

That’s a troubling conclusion considering that these data are the foundation of most observational studies, which inform much of our public health policies and recommendations.

Since the first American Heart Association Report in 1961 and the first Dietary Guidelines for Americans in 1980, people who are most educated on healthy diets not only eat more of the foods promoted by those guidelines as a result of healthy user bias, they also overestimate the amount of those foods they’re eating in food frequency questionnaires, leading to a self-fulfilling prophecy of what foods will result in better health outcomes.

Unsurprisingly then, foods recommended by early dietary guidelines from sixty years ago continue to fare well in large observational studies, including skim milk, egg whites, vegetable oils, and six daily servings of whole grains, not necessarily because those foods are indeed healthiest, but at least partly because healthy people think they’re healthiest.

Reliance on biomarkers

Determining whether a certain variable leads to death or disease requires following study participants for a long time, possibly many decades. Doing so is extremely time consuming and expensive., and most researchers would like to see the results of their observational studies be published during their career rather than after their retirement.

In order to avoid having to wait decades for results, researchers often rely on biomarkers like blood cholesterol levels, indicators that take only a few months to change, in order to deem a variable or intervention (like teeth brushing) good or bad. Eating fat is associated with high cholesterol levels? It must be bad. Eating more carbohydrates is associated with lower cholesterol? It must be good.

The problem is, our understanding of biomarkers and their relevance for health outcomes is constantly evolving.

Starting in the 1940s, the epidemiological link was established between blood cholesterol levels and heart disease. Because high-fat diets were shown to raise cholesterol levels, it was thought that eating fat caused heart disease, and that low-fat foods where therefore inherently healthy.

A new era of nutrition groupthink began, whereby fats and any other foods that raised total cholesterol were labeled the enemy and “low-fat cookies, hard candy, gum drops, sugar, syrup” and other snacks that contained little or no fat were recommended by influential organizations like the American Heart Association.

As a result of observational studies, a 1995 pamphlet from the American Heart Association recommends choosing snacks such as low-fat cookies, candy, and sugar (source)

However, total blood cholesterol levels were only part of the story. It turns out there are different types of cholesterol: the “good cholesterol” (HDL) and the “bad cholesterol” (LDL). While high-fat diets raised total cholesterol, it was found that only the saturated fat in the diet raised LDL, providing a more nuanced understanding of the relationship between fat consumption and heart disease.

Fast forward to the 21st century and our understanding of cholesterol evolved again. Just as total cholesterol is made up of HDL and LDL, scientists discovered that LDL is also made up of different types: small dense LDL and large buoyant LDL, and each type has a significantly different impact on heart disease risk.

In February 2021, the Journal of the American Heart Association ran a study that concluded, “Our data indicate that small dense LDL is the most atherogenic [plaque-forming] lipoprotein parameter.” The study also stated that, “no parameter added significant risk information to the pooled cohort equation once sdLDL [small dense LDL] was in the model.”

In other words, once researchers took into account the different types of LDL, it was found that small dense LDL “contributed significantly” to heart disease risk, while total cholesterol, total LDL, large buoyant LDL, and triglycerides had no effect on heart disease risk once the analysis included small dense LDL.

All these years, it appears that it wasn’t high total cholesterol levels or even high LDL cholesterol leading to heart disease; rather, small dense LDL particles specifically have been the culprits behind increased heart disease risk. At least that’s our understanding for now, until our research on the subject inevitably evolves again.

Therefore, when analyzing study data and determining which diet and lifestyle trends lead to disease and death, relying on biomarkers like total cholesterol and LDL cholesterol may lead to incorrect conclusions, since those outdated biomarkers only tell a fraction of the story.

Biomarkers are typically used when actual health outcomes, like death, are unavailable. While biomarkers can be unreliable and misleading, there’s no arguing about whether someone dies. For that reason, just as randomized trials are superior to observational studies, actual health outcomes are superior to biomarker data.

In some rare cases, a randomized trial is run long enough to measure both biomarkers and health outcomes. One such study is the Minnesota Coronary Experiment, a five-year double blind randomized controlled trial published in 2016 that found that participants who increased their consumption of vegetable oil lowered their LDL cholesterol. However, the group with the lowest LDL cholesterol also died the most of heart disease.

If the study were only measuring LDL cholesterol, the conclusion may be that vegetable oils prevent heart disease, but by measuring actual rates of heart disease and death, the opposite conclusion is made.

Would you rather be alive with slightly higher levels of a possibly irrelevant biomarker, or dead with low LDL cholesterol?

Where are the randomized controlled trials?

As we’ve seen, the results of observational studies are almost entirely worthless from a policy standpoint since they have just as much a chance of being wrong as they have of being right.

The problem is, recommendations are usually made from results of observational studies and the research often stops at the observational study, with no randomized trial performed to “prove” the observational study’s findings.

So, if observational studies are so problematic, why don’t we just rely on more randomized trials? It primarily comes down to money, time, and ethics.

Long-term randomized trials are extremely expensive to set up and run, and it can be difficult to find study funders. Unfortunately, it is often biased industry groups that foot the bill and their biases show up in the study results. For example, as the NY Times reported, a 2013 analysis of beverage studies found that, “those funded by Coca-Cola, PepsiCo, the American Beverage Association, and the sugar industry were five times more likely to find no link between sugary drinks and weight gain than studies whose authors reported no financial conflicts.”

The sugar industry may happily pay for a study to determine whether sugar causes diabetes, but only if they can lend a hand in designing the study and the results align with their interests.

In the case of dietary fat research, it can take four years for the fats in our cells to resemble the fats in our diet, meaning that studies lasting only a week, month, or year may not tell the whole story. The longer a study takes, the more money it costs, and thus the harder it is to get funded.

Lastly, it becomes unethical to subject study participants to an intervention that is thought to lead to disease and poor health. In our teeth brushing example, our researchers may feel unethical about submitting half the participants to only every-other-day teeth brushing.

In studying other variables, the consequences could be much much worse than bad breath. That’s why we’ll never do a randomized trial on cigarette smoking. Once an association was made between cigarettes and lung cancer, it was deemed too unethical to randomly assign half of participants to smoke a pack of cigarettes every day in order for us to “prove” that cigarette smoking causes cancer (and a host of other diseases).

Even if we did have an abundance of randomized trials that fit all the necessary criteria to provide unbiased and sufficient evidence of what’s healthy and what’s not, the results of observational headlines typically sound more dramatic, reliable, and powerful in comparison.

Because the size and duration of an observational study is only limited to how many people can fill out a food frequency questionnaire each year, observational studies often puff themselves up by boasting the large number of participants and study length. For example, “a 40,000-person study spanning 20 years” sounds impressive and reliable, and a randomized trial could never match those numbers; however, if the observational nature of the study makes it inherently flawed due to bias and confounding, it doesn’t matter how big the study is or how long it lasted, it is still an unreliable study.

Proponents of observational studies typically use the “weight of the evidence” argument, a term borrowed from legal rulings, to argue that even if observational studies aren’t perfect, if there are enough of them that come to the same conclusion, then they’re probably right. But this logic is flawed. If a study is garbage, it doesn’t matter how many more garbage studies or meta-analyses are piled on top of it. A heaping pile of garbage, no matter how big, is still garbage.

Conclusion

The tides may finally be starting to turn against observational studies. After decades of promoting egg whites and low cholesterol diets based on results of observational studies, the American Heart Association, in a January 2020 publication on cholesterol, did the equivalent of throwing its hands in the air:

“With observational data, it is difficult to assess the relationship of any individual food independently of a dietary pattern. Thus, the observations for eggs may be confounded by other dietary components and lifestyle behaviors that covary [correlate] with eggs. […] Although evidence from observational studies examining the relationship between dietary cholesterol and CVD [heart disease] risk is inconsistent, the discrepant results are likely heavily contributed to by residual confounding. It is difficult to distinguish between the effect of dietary cholesterol per se and the effect of dietary patterns high in cholesterol.”

While the limitations of observational studies are finally being acknowledged, that doesn’t mean that observational studies are useless. They can be a good first step in connecting dots. If two variables seem to be related, it’s worth looking into them more deeply with better-controlled studies. However, once seemingly important results from an observational study are published, it often takes decades to undo the damage.

A 1998 observational study, which was later retracted, found that babies who got the measles vaccine were more likely to develop autism. Since then, better research has repeatedly dismissed that connection, but many people still believe that vaccines cause autism, and the decrease in vaccinations has led to a resurgence of measles.

Similarly, observational studies have formed the backbone of recommendations to make vegetable oils a large part of our diet. As a result, vegetable oils now make up more than 20% of our calories–an increase from fewer than 1% of calories at the beginning of the last century–and the US dietary guidelines tell us we should continue consuming even more of them.

Only in the last few years have results from well-controlled randomized trials measuring actual health outcomes shown the opposite to be true–that consuming more vegetable oils results in a significantly higher risk of death. If dietary cholesterol and measles vaccines are any indication, we may live with the myth that vegetable oils are healthy for some time to come, even as the evidence based on randomized trials shows otherwise.

Whether it’s findings that eggs cause heart disease, vaccines cause autism, butter prevents diabetes, or your choice of breakfast cereal dictates your child’s gender, results of observational studies make great headlines, but to base the entirety of our nutrition policy off of biased observational studies is misguided at best and downright dangerous at worst.

While we may not all agree on what to eat any time soon, one thing we should agree on is to stop relying on observational studies to form so-called conclusive recommendations on nutrition.