top of page

Evaluating Medical Claims

  • May 2, 2020
  • 7 min read

Updated: 2 hours ago


(Sorry for the inconvenience. This page is currently being edited.)


A Plethora of Claims

"NAD+ makes you live longer", "Magnesium will help you sleep", and "Creatine will help your memory". These are supplement claims. We are bombarded by them every day. But these are just scratching the surface of the 90,000 supplements on the market - each one claiming to work. And if we include surgery, pharmaceuticals, mental health, lifestyle medicine, alternative medicine, longevity, 'wellness', and the history of medical claims, there are millions of individual claims. Each one claiming it works.


The Need for a Standardized Process

What this highlights is that in our desperate pursuit of health we are all easily fooled - both patients and experts alike. Consequently, as a patients, it's imperative that you learn how to evalute medical claims for yourself. Below, we will take a bird's-eye view of health claims, show you how we're being fooled, and give you a simple standardized process that can be used to evaluate any claim.









About

Why We Have a Plethora of Claims



A claim is only as good as the evidence that supports it. The vast majority of medical claims are based on just a few different categories of evidence. It's important to familiarize yourself with these categories, they are:


Categories of Evidence

  • In Vitro Experiments: Trying an intervention in a test tube ('In vitro' means in glass).

  • Animal Experiments: Trying an intervention on animals. 'Rat studies'.

  • Mechanism of Action: Speculating how the intervention works.

  • Expert Opinion: The opinion of a an expert when better evidence is unavailable.

  • Anecdotes: A patient's informal trial of an intervention on themselves.

  • Case Series: A doctor's formal trial of an intervention on patients.

  • Observational Studies: Trends in population data sets (eg. processed meat eater have higher rate of colon cancer)


Cheap and Easy to Generate

The reason we have a plethora of claims is that these categories are all cheap and easy to generate. There are millions of anecdotes, millions of test tube experiments, millions of individual experts, and millions of data point in observational data sets (big data). Every claim can find evidence to support it.




About

The Invention of The Gold Standard



The categories of evidence all harbor major limitations. Limitations that lead to false positive results. We are fooled by anecdotes, for example, due to placebo effects and natural healing. We are fooled by in vitro and animal experiments because working in a test tube often does not translate to working in a human. And we are fooled by observational studies because people who eat processed meat have other bad habits that may be increasing their cancer risk (confounding).


Engineering a Better Test

Good science is about challenging our ideas with better and better tests to try prove them WRONG. Medical science definitely needed a better test. The result: the randomized, double-blind, placebo-controlled trial, or RCT for short. It is the most direct test of a medical claim and engineered to account for the limitations of the other categories of evidence.


The RCT

The RCT is a large trial in humans - not animals or test tubes. The subjects are split randomly into two identical groups to avoid confounding. The only difference between groups is that one gets the intervention and the other gets a placebo. Both the subjects and the experimenters are unaware which group they are in ('double-blind'). The subjects are followed for a long period of time until real events happen. Events like death or heart attack, not just changes in biomarkers. The first truly randomized trial was published in 1948 testing whether Streptomycin could treat Tuberculosis. This has grown exponentially to 50,000 per year.


Limitations of the RCT

We can't use the RCT to evaluate every claim. RCT's are very expensive and very difficult to execute. In fact, sometimes they're just unfeasible. If we wanted to test, for example, whether the carnivore diet or vegan diet prevents cancer, it would be near impossible for several thousand people to maintain compliance for several years, and who would pay for it?


The Gold Standard

However, what we CAN use the RCT for is as a yardstick, or 'gold standard', to evaluate the other categories of evidence. A test to evaluate the other tests. We've mentioned that the other categories of evidence are unreliable, but just HOW unreliable are they?






About

Meta-research: Evaluating the Categories of Evidence




To see if animal studies are reliable, we can find a medical claim based on an animal study and see if it was confirmed in an RCT. When we do this on a grand scale, this bird’s-eye view of research is called meta-research - research about research, and in healthcare, there is fortunately a wealth of data to analyze. Pubmed, a service of the National Library of Medicine, has been storing all of the published studies in biomedicine since the 1960's. There are over 35 million papers. With this, we can follow literally hundred of thousands of medical claims, from inception to RCT.


Foundationalism - to calibrate all the other kinds of evidence


We can establish a theory of knowledge (epistemology) in medicine, a hierarchy of evidence. Some kinds of evidence are more reliable than others.





About

Why we are fooled



Unfortunately, they are all imperfect and fool us in different ways to think a claim is true when it is not. For example, many things that work in animals, or a cell experiment, do not work in humans. When things appear to work in a human, like a friend improves after a treatment, up to 90% of common complaints just get better on their own, and up to 35% of the response can be placebo. When we look at large groups of people and spot trends, like those who exercise have better health than those who do not, those people who exercise also don't smoke, are wealthier, eat better, and are not obese. How do we know which factor(s) caused the better health? This is called confounding.






About

The Hierarchy of Evidence






From the above findings, we can establish a hierarchy of evidence, or an evidence pyramid (see diagram below). The pyramid shape highlights that most medical claims are based on the weakest and easiest kinds of evidence to produce, anecdotes and expert opinion, whereas very few medical claims are based on the strongest most expensive evidence, an RCT, or multiple RCTs in a Systematic review.




  1. Anecdotes are unreliable - Although some major medical discoveries have been inspired by anecdotes, like Botox and Viagra, they are the exception, and they were verified with RCT’s. Anecdotes can be a starting point in medical research, not the endpoint.

  2. Expert opinion is unreliable - To be clear, the term "expert opinion" is an opinion derived from a single expert's clinical experience and physiologic reasoning, in the absence of stronger kinds of evidence, such as RCT's.

  3. Studies from academia are unreliable - Most of the animal studies and experiments in cells come from academia. When we try to replicate these experiments, less than 25% can be replicated. This is known as “the replication crisis” and is well documented in many fields of science. It is primally due to academia's reward system. They are rewarded for only positive findings. For instance, in an analysis of 2,000,000 published studies 96% had positive results. After all, no one ever won a Nobel prize for showing a treatment does NOT work.





Success rate (past performance)


A medical claim usually starts at the bottom of the pyramid with weak positive evidence, and as it goes through the gauntlet of stronger and more reliable kinds of evidence, it mostly gets disproven; this is called the decline effect.



Even when the preliminary research of animal studies and experiments in cells can be replicated, the likelihood of working in an RCT is still very low. How do we know this? As a regulated industry, the pharmaceutical industry is forced to use RCTs to confirm their ideas, and less than 1% of their ideas work. In Alzheimers disease, for example, they found 140 drug candidates over the last 30 years. None of them worked in large RCTs. They spent $600 billion.



Evaluating a medical claim


The process of approximating the likelihood a medical claim is true is relatively simple: find the evidence for that claim, determine what level of the pyramid that evidence sits, the lower it sits, the less likely the claim is to be true. (For a more detailed explanation on evaluating medical claims check out my lecture on YouTube)


As a medical professional, finding evidence and assessing what kind it is, is relatively easy. As a patient, this is far more challenging, and a perfect application for Artificial Intelligence.



Using AI to help you evaluate claims


Despite AI’s reputation of being misleading and even hallucinating, if we learn to set the appropriate parameters and teach it HOW to think, it can be quite reliable. In AI lingo this is called "prompt engineering". With respect to healthcare, we must "prompt engineer" the AI with our lessons from meta-research, and direct it to look at the evidence, not expert opinion. Let me provide an example.


Suppose you want to find out if the popular supplement NAD+ will make you live longer. Asking AI, "Will NAD+ make me live longer?" will likely result in the response, "It looks promising". (Try it out for yourself). This, of course, is the wrong answer because you neglected to instruct it how to evaluate the claim. So it puts together, all the biased experts, from all the blogs and podcasts, selling the claim, which far otweigh the actual evidence from published studies.


Instead ask AI: "Taking into account the hierarchy of evidence, the replication crisis, and the pharmaceutical industry success rate of less than 1%… will NAD+ make me live longer?" AI will now be forced to assess the level of the evidence of NAD+, and calculate a probability of NAD+ working based on that evidence. Since most of the research is animal studies and cell experiments, it will correctly inform you that, "the likelihood NAD+ will make you live longer is very low".



Evaluating medical claims and Shared-decision making


Evaluating medical claims is only just the beginning of making good heatlhcare decisions. Just because a claim has a low probability of being true, doesn't mean you shouldn't try it. There are many factor that go into a healthcare decision. Every patient is different, every situation is different, and every patient's tolerance for risk is different. Helping you decide which claims you should try and which you shouldn't is called Shared-decision making. It is another essential critical thinking skill in medicine. You can read about it here.











 
 
bottom of page