top of page

Clinical Medicine is Bayesian

Sep 13

15 min read

5

71

1

The Doctor, an 1891 portrait by Luke Fildes (Wikipedia)
The Doctor, an 1891 portrait by Luke Fildes (Wikipedia)

Based on what information do physicians make decisions? We would like to think it is science. After all, medical school admissions and many preclinical subjects are scientifically based, and some physicians are scientists. It seems intuitive that modern medicine is an evidence-based scientific endeavor. And to an extent, this is true. However, the history of scientific medicine is very short; just two centuries ago, bloodletting was still practiced to treat virtually all ailments. Two centuries have not been enough to fully understand the complexity of the human body — in fact, “fully” may be too generous a word.


Much of medicine once resembled alchemy — it based its judgments largely on faulty assumptions immune to reality checks. Bloodletting, for example, was grounded in the idea of humoral imbalance: the belief that four amorphous fluids maintained health, and that disease resulted from an excess of one of them, blood being one. This was the age of top-down medicine. If the patient died, that outcome was not used to question the validity of the theory itself. The very idea that reality should test theory is a relatively recent discovery. Much like alchemy, in the face of the unknown, medical observations carried a psychological dimension. It is easy to dismiss the humoral theory as superstition, yet baseless theories of health have consistently shaped our perception of therapy.


For example, in 19th-century Germany, the idea of vital energy obtained from pure nature — water, air, sun, and soil — as the foundation of health, and the view of the human body as a vessel depleted without regular contact with nature, drove much of therapeutic practice. Hydrotherapy, spa therapy, sunbathing, and even homeopathy — the notion that diluting active molecules in water somehow imbued water with healing power — all derived from this vital energy theory. To this day, Germany nationally subsidizes spa therapy, and many households open windows even in the middle of winter to “exchange air.” Practitioners often offer scientific-sounding justifications, claiming it is an exchange of oxygen and carbon dioxide. The fact that these practices are baseless rarely troubles them. Most never attempt to collect evidence at all.

I mention this not as a historical curiosity. Certain areas of medicine remain grounded in theory without rigorous evidence. For example, much of orthopedic surgery and psychiatry rest on weak or nonexistent data. The idea that if a herniated disc compresses a nerve it should be surgically decompressed is highly intuitive. Detecting breast cancer early with a mammogram also feels self-evidently lifesaving. But intuition is dangerous in medicine, because it discourages us from asking whether these beliefs are actually true. In such cases, the practice of medicine becomes the practice of faith: physicians act on what they believe rather than what has been proven.


Although some specialties still operate in this way, evidence-based medicine has made enormous progress. At least in spirit, we now strive to ground our practice in the best available evidence. But what is the nature of this evidence? How much do we really know? Sometimes a little knowledge blinds us to the depth of our ignorance.


Medicine as Science vs. Medicine as Practice


When I entered medical school, ambitious and idealistic, I hoped that studying medicine would teach me the deep logic of how the body works. I expected an integration of the sciences that would reveal the design principles of the body, and through them, how diseases manifest as evolutionary trade-offs. Naively, I imagined that understanding biochemical pathways would explain why we develop diabetes.


Science has indeed revealed glimpses of the body’s inner logic, but this is largely outside the scope of medicine. Medicine is not about understanding the body; medicine is about knowing what to do when it becomes sick. Recognizing these as separate questions — and realizing that the medical school curriculum leaves no time for the former — profoundly disappointed me.


I had thought a doctor was to the human body what an engineer is to a car: someone who knows how it works, why it works that way, and why it breaks. In reality, a doctor is closer to a mechanic: you don’t need to understand the full complexity, or even most of it; you just need to know what to do. It felt as if I were officially giving up intellectual curiosity to become an automaton. Memorize, regurgitate, don’t ask why.


What made this more frustrating was that medical students often embody the Dunning–Kruger effect. While I struggled with how little medicine actually knows, the happiest students were those satisfied with simply managing the portion they were fed. They patted themselves on the back for “managing” medicine. They will no doubt make more suitable clinicians than I, as they will take pride in following guidelines.


What Kind of Knowledge Is Clinical Knowledge?


This question preoccupied me throughout medical school. Take inflammatory bowel disease (IBD) as an example. There are multiple approaches.


The first is biological, which was what I yearned to study. IBD arises when the immune system attacks the bowel. Normally, the immune system distinguishes self from non-self, but when this training fails, T cells may attack the body itself. Why the bowel specifically? Possibly because it contains the highest bacterial load and is therefore more heavily monitored, or because certain T cell subtypes are more error-prone. Perhaps faulty education of T cells, or cross-reactivity between microbial antigens and bowel proteins, explains it.


But this is not the concern of clinical medicine. Clinical medicine is about recognizing IBD when you see it: which demographics are affected, what the typical signs and symptoms are, which tests differentiate it from other conditions, and how to treat it.

These are two distinct types of knowledge. The former aims to understand why the disease occurs; the latter aims to separate it from other possibilities in order to choose a strategy.


The art of the latter is fundamentally different. The former is science — immunology, hypothesis testing, experiments. The latter is Bayesian inference. “Guessing” may sound terrible to patients, but such probabilistic inference is the core of practice.


A patient arrives. The doctor collects whatever information is immediately available to form an initial estimate of what is likely or unlikely. With each question, the base rate is updated: How long has it been going on? Is it acute or chronic? What kind of pain? Does it recur? Was there blood in the stool? Each answer shifts the probability.


Over time, Bayesian inference in the human brain becomes pattern recognition. An experienced physician no longer thinks in terms of biology at all — the sequence of questions alone delineates the disease. It becomes like recalling a sequence of letters, or interpreting a Rorschach test. Why things are the way they are recedes into irrelevance. I could already feel this shift as a medical student. Preclinical courses still cared somewhat about “why.” Once you reach clinical training, it is just a puzzle. The meaning of the picture no longer matters.


Let me illustrate this with a case study:


Clinical Case


A man in his forties had been experiencing vague but recurring health problems for several years. At first, the symptoms seemed innocuous, episodic, and unrelated. He developed erythema nodosum — painful red lumps on his legs — especially after long runs. Several dermatologists assumed it was reactive or post-infectious and prescribed tetracycline. The rash resolved within days, so the issue was not pursued further.


Later, he began experiencing recurrent episodes of epididymitis, each marked by testicular pain and swelling. Tests showed nothing suspicious. His urologist prescribed moxifloxacin, a broad-spectrum fluoroquinolone, under the assumption of infection. Each time he took the antibiotic, symptoms resolved within days, only to recur weeks later. Subjectively, it felt as though the drug was working.


Some time later, the patient developed a deep vein thrombosis (DVT) in one leg — a painful clot that can be fatal if it dislodges. In a man of his age, without provoking factors, this was unusual. The emergency team made a reasonable inference: a clot at his age could signal a genetic predisposition or even an occult cancer. Testing revealed a heterozygous factor V Leiden mutation, relatively common in Western populations. A CT scan showed no cancer. The clot was explained away as mild genetic predisposition, and he was given a short course of apixaban.


While investigating the DVT, blood was drawn from his arm. He developed inflammation and pain at the site, with a long palpable clot in the superficial vein. Unlike DVT, such clots rarely dislodge, so he was treated with topical heparin cream. He reported that he often had exaggerated inflammatory reactions to even minor needle pricks. This was an important diagnostic clue, but his general practitioner dismissed it.


Over the years, his symptoms began to form a recognizable pattern. Episodes of erythema nodosum preceded attacks of epididymitis, which were followed by aphthous ulcers in the mouth. In one severe episode, the skin on his scrotum began to peel and slough. At this point, it was clear that these inflammatory conditions likely shared a single autoimmune cause.


The diagnosis came unexpectedly from a medical student from Turkey. In Turkey, Behçet’s disease is far more common than in Europe. Classic signs — aphthous ulcers, erythema nodosum, genital lesions — immediately raised suspicion. His exaggerated reaction at the needle prick site (the “pathergy reaction”) was another strong clue.

Armed with this suspicion, the patient revisited an allergy and immunology specialist he had previously seen. This time, after excluding active infection, the doctor prescribed colchicine. The result was dramatic: his recurrent inflammatory attacks quickly resolved.


This case illustrates how clinical medicine works in practice. A dermatologist seeing erythema nodosum must ask: what causes it? Current knowledge suggests it is a form of immune hypersensitivity that can be triggered by almost anything — infections, autoimmune conditions, or drugs. This range of possibilities is far too broad to act on directly. The clinician must narrow it down by asking what is most likely. Recent infections? Medications? Past history?


Diagnosis begins with population epidemiology: what are the common causes in this region? Each additional piece of patient information adjusts the likelihood of each explanation. This is the essence of Bayesian reasoning: starting from a base rate, then updating probabilities as new information arrives.


Crucially, medical diagnosis rarely comes from direct demonstration of the underlying biology. It comes from statistical inference. As this case shows, if base rates are low — as for Behçet’s in Europe — the disease is easily missed, even when the signs are present.


Unfortunately, and not because of lack of effort, this statistical reasoning framework is still very conceptual in current clinical medicine. It is not possible or reasonable to incorporate rigorous statistical numbers into daily judgment. Much of this is because the numbers simply do not exist — ideally, we would have reliable statistics for every condition in every hospital region. Even when such numbers are theoretically accessible, for example where hospitals do keep track of their patients, they are rarely used systematically by practitioners to guide decisions. I will return to this point later when discussing the use of AI in medicine.


A 12th-century manuscript of the Hippocratic Oath in Greek, one of the most famous aspects of classical medicine that carried into later eras (Wikipedia).
A 12th-century manuscript of the Hippocratic Oath in Greek, one of the most famous aspects of classical medicine that carried into later eras (Wikipedia).

This difference between statistical inference and probing biology might seem obvious or even subtle, but the implications are profound. Because medicine is fundamentally an art of guessing, we must accept the following:


(1) Guesswork is not about being correct — it is about betting on the most reasonable.


The famous Hippocratic oath, “first, do no harm,” is a beautiful ideal that many medical students accept rather blindly. Rarely do they realize that this is not the premise of medicine. The goal of medicine is to do the least harm at the population scale, not to avoid all harm to every individual patient. I am not saying medicine “tries” to harm no one but ends up harming some by imperfection — rather, harm is strategic, not accidental.


When the Behçet’s patient above was prescribed antibiotics, clinicians were at least theoretically aware that antibiotics are not without consequences. Every medication and every invasive procedure in clinical medicine carries some degree of harm: sometimes minimal, like drowsiness from first-generation antihistamines; sometimes profound, like prophylactic organ removal. And I am not talking about malpractice. Even when medicine is practiced as intended, there is always a cost to guessing under uncertainty, and we base that guesswork on the expected benefit. But as with anything uncertain, this reward assessment may be fundamentally flawed. Medicine is not a black-and-white “do no harm” fundamentalism, but a “harm–benefit analysis” algorithm. Doing harm is part of the business if it is done for the sake of expected benefit. This may sound brutal, but there is no other way to approach uncertainty.


(2) Clinical medicine is fundamentally optimized at the population level.


That medicine operates on a harm–benefit trade-off is not inherently a problem. But the next point is a major challenge we must work to solve. Currently, the harm–benefit calculus is based almost entirely on population-scale information. In other words, the focus is on minimizing harm to the population by following predefined protocols. This reproducibility allows us to assess whether a procedure is effective, but only at the population level. Whether it works for you as an individual, we shall never truly know, because there is no practical way to set a proper control for each patient.


For example, whether a blood pressure medication works for you is not the concern of clinical medicine. Rather, your doctor prescribes it because in a group of patients with your profile, that medication is statistically better than placebo. In pharmacology, this expectation is captured by the number needed to treat (NNT): the average number of patients who must be treated for one to benefit. A low NNT indicates the drug is effective for many people, while a high NNT means it takes many patients to see one benefit. Surprisingly, antidepressants, despite their poor reputation, often have relatively low NNTs (below 10). Antihypertensives, by contrast, can have NNTs in the hundreds or even over a thousand. Virtually no treatment has an NNT of 1–2. This means that treatments rarely work in every case — or even in most cases. An NNT of 10 means ten people must be treated for one to experience measurable benefit. That is the realm of therapy we are talking about.


(3) Information in clinical medicine is mostly used to alter base rates, not to probe biology directly.


The pathophysiology of Behçet’s disease is complex and poorly understood. But diagnosing Behçet’s does not necessarily require that understanding — and this applies to most pathologies. The power of statistics is that it connects events without requiring knowledge of the underlying mechanism. To know there is a correlation requires no knowledge of how the variables are mechanistically linked.


This realization was terrifying for me as a medical student, because it essentially means biology is dispensable for many aspects of clinical medicine. For a student in love with biology rather than statistics, this felt like forbidden territory.


Take autoimmune disorders like IBD and Behçet’s disease, which correlate with the HLA-B27 gene. What this means biologically is still a mystery, but medical students are tested on the correlation because it helps clinically. Why? Because clinical medicine is about altering base rates of confidence in a diagnosis, not explaining exactly how the disease developed. Of course, correlations are rarely perfect: not all Behçet’s patients have HLA-B27, and not all HLA-B27 carriers develop Behçet’s. Nonetheless, this is how clinical medicine is built — medical students memorize correlative parameters that often make little logical sense together, because these snippets help clinicians make probabilistic guesses. For correlation, no mechanistic link to the disease process is required.


This fundamental nature of medicine is surprisingly unnoticed by many clinicians. I once heard a surgeon say, “If you take an HIV test and it’s positive, it means you have HIV. Simple as that.” Similarly, many doctors believe that if cancer is detected by mammography or CT, you have cancer and it must be removed. They are unaware that most tests, even molecular ones, must be interpreted through base rates and population-scale data. No test is perfectly sensitive and specific. HIV tests do produce false results. Even if the fraction of false results is small, in populations with very low disease prevalence, those false positives can overwhelm the true results.


Imagine a village of 100 people where only one person has HIV. If the HIV test has 99% sensitivity and 99% specificity, then the one infected person will almost certainly test positive. But among the 99 uninfected, 1% (about one person) will also test positive falsely. Now we have two positive results — only one is real. In this village, a positive HIV test has only 50% predictive value. This is not a very good test, despite being “99% accurate.”


This statistical nature is why pre-selecting high-risk groups for testing is crucial — restricting testing changes the base rate, which improves predictive value. The same logic applies to cancer: not all tumors are “killer” tumors. Some regress; others grow so slowly they never cause harm. Imaging on a population scale will inevitably detect many of these indolent tumors, leading to invasive procedures that may cause more harm than good. This is why most cancers are not screened for at the population level.

Unfortunately, medical doctors are not systematically trained in statistical reasoning, despite so much of medicine being more statistics than biology.


(4) Rare events, outliers, or unusual patterns are particularly vulnerable to misdiagnosis.


Why is Behçet’s disease difficult to diagnose in Europe but not in Turkey? Simply because it is more common in Turkey. This highlights a deeper reality: the Bayesian framework of clinical medicine means that the more experience you gain, the harder it becomes to diagnose low-base-rate diseases.


Theodore Woodward famously said: “When you hear hoofbeats, think horses, not zebras.” In other words, look for the common cause, not the rare one. Erythema nodosum or epididymitis in Europe is most often infectious, so that’s what clinicians assume. This makes perfect sense — until years of horses make you forget that zebras exist. Yet zebras do exist, as the Behçet’s case demonstrates.


Modern medicine in the West essentially began with anatomical dissection. But as the anatomist’s scalpel separated the liver from the heart, it also separated concepts. Today hospitals reflect this: skin problems go to the dermatologist, heart problems to the cardiologist. This separation has logic, as organs do have distinct functions. Yet the liver’s job, like the brain’s, is to serve the whole body, not itself — physiology is interconnected.


While medicine recognizes inter-organ connections intellectually, the Bayesian structure of clinical reasoning leaves little room for such nuance. Erythema nodosum, epididymitis, aphthous ulcers, and DVT in the same patient, if considered together, would immediately suggest a unifying cause. But outpatient visits are organized in ways that make clinicians myopic: a dermatologist will ask thoroughly about skin problems but may ignore systemic symptoms. Bayesian inference and pattern recognition are only as good as the information you feed them.


If we had a systematic way to differentiate infectious from autoimmune inflammation in clinics, we might not need to rely so heavily on collecting every symptom. But medicine cannot yet do this. Instead, dividing information by organ system — the way specialties are structured — systematically introduces blind spots. This can be partly corrected by teaching doctors to ask about symptoms beyond their specialty, but the rarity of certain combinations still makes them easy to miss. Cough plus fever is an obvious connection. Erythema nodosum plus DVT is not.


Another pitfall is “hallucination” — seeing a pattern where there is none. For example, when a young woman presents with weight loss, no major findings, and no improvement with diet, the “typical” presentation suggests anorexia. It is difficult to think of celiac disease. Current medicine builds disease models primarily from statistical parameters, not from biology. What the body is “trying to do” is often unclear.


Diseases in clinical practice reveal themselves less like a sequence of causal events forming a logical story, and more like a half-finished puzzle with most pieces missing. New pieces appear, but you don’t know if they belong to this picture or another. Clinicians must act on this incomplete picture. And just as random patterns can look like faces, clinicians sometimes see patterns where none exist.


Ki67 stain calculation by the open-source software QuPath in a pure seminoma, which gives a measure of the proliferation rate of the tumor. The colors represent the intensity of expression: blue-no expression, yellow-low, orange-moderate, and red-high expression. Figure 2 from Lourenço et al., “Ki67 and LSD1 Expression in Testicular Germ Cell Tumors” (Life, 2022). Licensed under CC BY 4.0
Ki67 stain calculation by the open-source software QuPath in a pure seminoma, which gives a measure of the proliferation rate of the tumor. The colors represent the intensity of expression: blue-no expression, yellow-low, orange-moderate, and red-high expression. Figure 2 from Lourenço et al., “Ki67 and LSD1 Expression in Testicular Germ Cell Tumors” (Life, 2022). Licensed under CC BY 4.0

The (Unavoidable) Rise of AI in Medicine


Many voice concerns about the possibility of AI taking over in medicine, worrying that it might strip the field of its “human touch.” I, for one, don’t see this as a problem. As I have argued above, much of medicine is already statistical inference and pattern recognition — tasks at which AI excels. If anything, it is often the so-called “human touch” that misguides this process: the rigid division of medicine into organ-based specialties, the obsession with neat “just-so” patterns, and the hubris of believing that a few years of scientific education somehow grants doctors deep understanding of how the body truly works.


AI, by contrast, holds great promise in overcoming many of these limitations. Human brains struggle to integrate information across multiple organ systems, but computers have no such problem. With comprehensive electronic health records, an AI system can effortlessly synthesize data, recognize cross-organ connections, and generate broad lists of differential diagnoses based on likelihood.


So will human doctors become obsolete? I hope not. What I envision is a division of labor: AI will take over the rote tasks of memorizing patterns and juggling disconnected parameters, while human clinicians will be freed to pursue what machines cannot — developing new strategies to probe the body and uncover mechanisms that go beyond the Bayesian framework.


This shift could also change the very design of medicine. Instead of forcing patients into single-organ specialties, we could build clinics where each case is approached through multiple lenses — both organ-oriented and systemic (some forms of it is already tried). A unified database tracking health records across time would allow AI to highlight patterns invisible to human memory, while researchers could push forward efforts to improve biological readouts. The eventual goal would be a transition from statistical reasoning to biological reasoning.


Such a transition would also help us escape the “symptoms-first” model of medicine, in which care only begins once disease has already manifested. If we could probe biology deeply — especially in healthy states — we might detect deviations long before symptoms arise, fundamentally transforming clinical care.


None of this means abandoning statistics. Statistical reasoning is powerful and will always remain part of medicine. But the urge to move beyond it is not new. The very fact that medical education begins with molecular and cellular biology shows that the aspiration to ground medicine in mechanism is already present. My hope is that we first recognize clearly that clinical medicine’s statistical and biological approaches are distinct, requiring different strategies from the outset. Much of the former will inevitably be handed over to AI. The brightest minds in medical school should devote their ambitions to the latter.

Sep 13

15 min read

5

71

1

Related Posts

Comments (1)

meffova
Sep 18

Very nicely structured. I definitely have a lot of things to think about!

Like

Dept. of Biomolecular Sciences, Weizmann Institute of Science, 234 Herzl St, Rehovot, Israel, 7630031

bottom of page