Critical Thinking and Evaluating Medical Claims

Islon Woolf MD
May 2, 2020
6 min read

Updated: Apr 4

(this page is under construction)

The supplement NAD+ makes you live longer. Stem cell injections help knee pain. Red meat causes cancer. These are medical claims. Are they true?

Unfortunately, as a result of medicine being so technical, patients are unable to evaluate these medical claims for themselves. This is the patient dilemma, and it leaves them with two main options: trust an expert to evaluate the claims, and/or trust anecdotes and success stories of friends and acquaintances.

Unfortunately, both experts and anecdotes are unreliable. Think about supplements. There are 90,000 supplements currently on the US market. Each one is backed by an esteemed expert and dozens of testimonials. Or think about medicine's past history of treatments like bloodletting that fooled experts and patients for almost two and half millennia.

The underlying problem here is that in the field of healthcare it's very easy for us to get fooled - even the experts. Because of this, it's imperative that the modern patient learn how to evaluate medical claims for themselves. As I will show you, the best way to do this is to take a bird's-eye view of healthcare and familiarize yourself with our past performance and success rates.

But to determine our success rate, we first need a gold standard, a yardstick to measure everything.

The invention of the gold standard, the RCT

Claims in medicine are supporting by many different kinds of evidence. We've already mentioned two: expert opinion and anecdote; there are also: animal studies, research in cells, and observational studies (finding patterns in large groups of people).

Unfortunately, they are all imperfect and fool us in different ways to think a claim is true when it is not. For example, many things that work in animals, or a cell experiment, do not work in humans. When things appear to work in a human, like a friend improves after a treatment, up to 90% of common complaints just get better on their own, and up to 35% of the response can be placebo. When we look at large groups of people and spot trends, like those who exercise have better health than those who do not, those people who exercise also don't smoke, are wealthier, eat better, and are not obese. How do we know which factor(s) caused the better health? This is called confounding.

Over the past century, medical science has specifically addressed these shortcomings and developed a more reliable way to evaluate claims. The result is: the randomized, double-blind, placebo-controlled trial, or RCT for short. It is a large trial in human subjects, not animals or cells. The subjects are split randomly to create two identical groups. The only difference between groups is that one group gets the treatment and the other gets placebo. Even the subjects and the experimenters are unaware which group they are in. They are double-blinded. Finally, the subjects are followed over time for real-world outcomes, like deaths or heart attacks, not just changes in cells or biomarkers.

RCT's are expensive, difficult to execute, sometimes unfeasible, and like other kinds of evidence, prone to fraud; however, when executed properly, and the results replicated, they are the gold standard; our strongest and most reliable kind of evidence. The impact of the RCT cannot be understated, and truly belongs to the list of the top medical inventions of all time. The RCT not only helps us evaluate medical claims more accurately, but as a gold standard, we can use it to evaluate the other kinds of evidence.

Meta-research

To see if animal studies are reliable, we can find a medical claim based on an animal study and see if it was confirmed in an RCT. When we do this on a grand scale, this bird’s-eye view of research is called meta-research - research about research, and in healthcare, there is fortunately a wealth of data to analyze. Pubmed, a service of the National Library of Medicine, has been storing all of the published studies in biomedicine since the 1960's. There are over 35 million papers. With this, we can follow literally hundred of thousands of medical claims, from inception to RCT.

We can establish a theory of knowledge (epistemology) in medicine, a hierarchy of evidence. Some kinds of evidence are more reliable than others.

The hierarchy of evidence

From the above findings, we can establish a hierarchy of evidence, or an evidence pyramid (see diagram below). The pyramid shape highlights that most medical claims are based on the weakest and easiest kinds of evidence to produce, anecdotes and expert opinion, whereas very few medical claims are based on the strongest most expensive evidence, an RCT, or multiple RCTs in a Systematic review.

Anecdotes are unreliable - Although some major medical discoveries have been inspired by anecdotes, like Botox and Viagra, they are the exception, and they were verified with RCT’s. Anecdotes can be a starting point in medical research, not the endpoint.
Expert opinion is unreliable - To be clear, the term "expert opinion" is an opinion derived from a single expert's clinical experience and physiologic reasoning, in the absence of stronger kinds of evidence, such as RCT's.
Studies from academia are unreliable - Most of the animal studies and experiments in cells come from academia. When we try to replicate these experiments, less than 25% can be replicated. This is known as “the replication crisis” and is well documented in many fields of science. It is primally due to academia's reward system. They are rewarded for only positive findings. For instance, in an analysis of 2,000,000 published studies 96% had positive results. After all, no one ever won a Nobel prize for showing a treatment does NOT work.

Success rate (past performance)

A medical claim usually starts at the bottom of the pyramid with weak positive evidence, and as it goes through the gauntlet of stronger and more reliable kinds of evidence, it mostly gets disproven; this is called the decline effect.

Even when the preliminary research of animal studies and experiments in cells can be replicated, the likelihood of working in an RCT is still very low. How do we know this? As a regulated industry, the pharmaceutical industry is forced to use RCTs to confirm their ideas, and less than 1% of their ideas work. In Alzheimers disease, for example, they found 140 drug candidates over the last 30 years. None of them worked in large RCTs. They spent $600 billion.

Evaluating a medical claim

The process of approximating the likelihood a medical claim is true is relatively simple: find the evidence for that claim, determine what level of the pyramid that evidence sits, the lower it sits, the less likely the claim is to be true. (For a more detailed explanation on evaluating medical claims check out my lecture on YouTube)

As a medical professional, finding evidence and assessing what kind it is, is relatively easy. As a patient, this is far more challenging, and a perfect application for Artificial Intelligence.

Using AI to help you evaluate claims

Despite AI’s reputation of being misleading and even hallucinating, if we learn to set the appropriate parameters and teach it HOW to think, it can be quite reliable. In AI lingo this is called "prompt engineering". With respect to healthcare, we must "prompt engineer" the AI with our lessons from meta-research, and direct it to look at the evidence, not expert opinion. Let me provide an example.

Suppose you want to find out if the popular supplement NAD+ will make you live longer. Asking AI, "Will NAD+ make me live longer?" will likely result in the response, "It looks promising". (Try it out for yourself). This, of course, is the wrong answer because you neglected to instruct it how to evaluate the claim. So it puts together, all the biased experts, from all the blogs and podcasts, selling the claim, which far otweigh the actual evidence from published studies.

Instead ask AI: "Taking into account the hierarchy of evidence, the replication crisis, and the pharmaceutical industry success rate of less than 1%… will NAD+ make me live longer?" AI will now be forced to assess the level of the evidence of NAD+, and calculate a probability of NAD+ working based on that evidence. Since most of the research is animal studies and cell experiments, it will correctly inform you that, "the likelihood NAD+ will make you live longer is very low".

Evaluating medical claims and Shared-decision making

Evaluating medical claims is only just the beginning of making good heatlhcare decisions. Just because a claim has a low probability of being true, doesn't mean you shouldn't try it. There are many factor that go into a healthcare decision. Every patient is different, every situation is different, and every patient's tolerance for risk is different. Helping you decide which claims you should try and which you shouldn't is called Shared-decision making. It is another essential critical thinking skill in medicine. You can read about it here.

Critical Thinking in Medicine

Blog