Our Frameworks | Medical AI Gone Wrong

Why We Need Frameworks

A Greek philosopher, a Terminator, and TheDude walk into a trust-building workshop. The philosopher wants to contemplate the nature of reliability. The Terminator wants to optimize trust metrics. TheDude just wants to know: "Man, can I count on you when it matters?"

Most medical AI critique falls into two camps: pure hype ("this will revolutionize healthcare!") or pure fear ("AI will kill us all!"). Neither is useful.

We need frameworks that ground our analysis in reality—biological reality, clinical reality, and liability reality. Not theoretical concerns. Not abstract ethics. Real-world conditions that determine whether AI helps or harms.

These six frameworks guide every case analysis on this site:

🦖

The Velociraptor Test

"Until AI has to wrestle a velociraptor for dinner or protect its kids from a saber-toothed tiger, it will never have the contextual awareness evolution gave humans."

What It Means

Evolution is the ultimate debugger. Natural selection refined threat detection, pattern recognition, and contextual judgment over 3.8 billion years. Every successful adaptation was tested against survival pressure. Every failed approach was eliminated from the gene pool.

AI trained on text and images for 18 months missed some edge cases.

Why It Matters

When a patient walks into a clinic pale, sweaty, and clutching their chest, a human physician doesn't need an algorithm to know something's wrong. The velociraptor brain—that ancient threat detection system debugged over millions of years of "get it wrong and you die"—already knows.

AI sees: text describing symptoms, maybe an image, statistical correlations in training data.

Humans sense: actual environmental threats that required immediate, correct response or you didn't survive to reproduce.

Application to Medical AI

Pattern recognition without survival pressure is just sophisticated guessing
Training data doesn't include "die if you're wrong" feedback
Context matters, and context comes from environmental sensing
Confidence without consequence creates dangerous overreach

Example: MedGemma MRI Case

The AI confidently diagnosed from a single MRI slice because it faced no consequences for being wrong. A human radiologist knows: recommend brain surgery based on inadequate imaging and someone's skull gets opened. That's selection pressure. That's why humans say "I need more views" when they need more views.

🧠

The 10 Billion Sensors Principle

"Humans have approximately 10¹² sensory neurons constantly sampling the environment. AI processes text or pixels. You can't separate intelligence from sensing—the data gathering is half the game."

What It Means

Intelligence isn't just processing power. It's environmental awareness. And environmental awareness requires sensing.

Humans have:

~126 million photoreceptors (vision)
~16,000 hair cells (hearing)
~10 million olfactory receptors (smell)
~2-4 million mechanoreceptors (touch, proprioception)
~10,000 taste receptors
Billions of nociceptors (pain detection)

All constantly sampling the environment, integrating information, detecting threats, sensing context.

AI has: whatever pixels or text you give it.

Why It Matters

Clinical medicine depends on sensing. A surgeon can feel tissue tension. A cardiologist can hear subtle murmurs. An emergency physician can smell ketoacidosis before lab results confirm it. A pediatrician can see when a mother's concern goes beyond typical parental worry.

✅ What Humans Detect

👀 Diaphoresis (sweating)
👃 Ketoacidosis odor
👂 Voice tremor
🤚 Skin temperature
🧠 Patient fear/confusion
⏰ How fast things change

❌ What AI Detects

📊 Text patterns
📊 Image pixels
📊 Statistical correlations
No smell. No touch. No hearing. No environmental context. No temporal awareness.

Application to Medical AI

The sensing gap is unbridgeable with current technology
AI cannot detect what it cannot sense
Clinical decision-making requires multi-modal sensory integration
Pattern recognition without sensing is incomplete data processing

💰

The Malpractice Insurance Reality Check

"Who pays when AI gets it wrong? Not OpenAI. Not Google. Not Anthropic. The physician with the medical license and the malpractice insurance. Until AI companies face consequences, they're playing with house money."

What It Means

I pay malpractice insurance. I've been paying it for 20 years. I pay it because when I make a mistake, someone gets hurt, and I'm accountable for that harm.

Google doesn't pay malpractice insurance. OpenAI doesn't pay malpractice insurance. They release systems with disclaimers: "Not for clinical use. No warranty. Use at your own risk."

But here's the thing: when those systems are used clinically (and they will be), who faces consequences?

Why It Matters

Accountability structures shape behavior. When you face meaningful consequences for failures, you build systems differently. You test more carefully. You validate more thoroughly. You acknowledge limitations honestly.

When you face zero consequences, you optimize for different metrics. Speed. Capability. Impressive demos. Market share.

The Accountability Asymmetry

❌ AI Companies

Liability: Zero (disclaimer protected)

Malpractice Insurance: $0

Consequences: None when system fails

Incentives: Capability, speed, market adoption

❌ Physicians

Liability: Complete

Malpractice Insurance: $50K-200K+ annually

Consequences: Lawsuits, license loss, career ending

Incentives: Patient safety, accuracy, validation

Application to Medical AI

Until developers face consequences, they're not motivated to prevent failures
Physicians bear 100% of liability for trusting AI systems
This asymmetry creates perverse incentives for rapid deployment
Real accountability requires meaningful consequences for failures

🎯

Intelligent Humility

"The architectural capacity to recognize and acknowledge the boundaries of validated competence. 'I don't know' isn't a bug—it's the most important output a medical AI can produce."

What It Means

Intelligent Humility is a design principle: build systems that know what they don't know.

Not as a guardrail added later. Not as a disclaimer in the terms of service. As an architectural feature from the ground up.

When an AI system encounters a query outside its validated knowledge domain, it should say: "I don't have reliable information on this" rather than generating confident-sounding nonsense.

Why It Matters

Most AI failures in medicine come from confident confabulation—retrieving tangentially related content and presenting it as if it answers the question.

The most dangerous medical statement isn't "I don't know." It's "I'm confident" when you shouldn't be.

What Intelligent Humility Looks Like:

Query: "Should I increase the patient's dosage?"

System Without Humility: Retrieves information about the medication, generates confident-sounding dosing recommendations based on pattern-matching, presents with citations.

System With Humility: "I don't have access to this patient's complete medical record, current medications, lab values, or contraindications. Dosing decisions require comprehensive clinical context I don't possess. This requires physician judgment."

Application to Medical AI

Build systems that can't generate responses outside validated domains
Constrain knowledge sources to curated, validated content
Eliminate hallucination through architecture, not filtering
Make "I don't know" a first-class output, not a failure state

📚

Content-Controlled Intelligence

"Constraint isn't limitation—it's precision. You can't hallucinate what you don't have access to. Architecture prevents problems better than filtering catches them."

What It Means

Most medical AI systems are trained on everything: peer-reviewed journals, Reddit threads, blog posts, that one article about essential oils curing cancer, and approximately 47 million pages of SEO-optimized garbage.

When they retrieve information, they're pulling from all of that, with no way to distinguish reliable from unreliable sources.

Content-Controlled Intelligence flips this: constrain the AI's knowledge to validated, curated sources. If it's not in the verified corpus, it doesn't exist for the AI.

Why It Matters

Hallucinations happen when AI systems try to fill knowledge gaps by generating plausible-sounding content. The solution isn't better hallucination detection—it's preventing hallucination through architectural constraint.

Case Study: EdAI Systems

We generate medical board certification questions using Claude Sonnet constrained to StatPearls© content only. Zero hallucinations over 18 months. How? The AI literally cannot access information outside the curated medical corpus.

Result: 100% customer retention across four medical specialty boards, producing 20% of actual certification exams for two boards.

The Architecture

Curate validated knowledge sources (peer-reviewed, board-approved)
Constrain AI access to only these sources
When query is outside validated domain → "I don't know"
No retrieval from general internet
No pattern-matching from unreliable training data

Application to Medical AI

Garbage in, garbage out—so control what goes in
Prevention through constraint beats detection through filtering
Validated sources eliminate need for hallucination detection
Specialization over generalization for high-stakes domains

🔬

Evolution as Debugger

"3.8 billion years of natural selection debugged systems we're trying to replicate in 18 months. Maybe trust the velociraptor brain."

What It Means

Evolution is the most rigorous testing framework ever devised. Every organism alive today represents a lineage that survived countless challenges: predators, disease, environmental change, resource competition, mate selection.

Every adaptation was field-tested under survival pressure. Failed approaches were eliminated. Successful strategies were refined over millions of generations.

The result: biological systems with exquisite sensing, contextual judgment, uncertainty tolerance, and threat detection capabilities.

Why It Matters

We're trying to replicate human intelligence using algorithms trained on text. But human intelligence isn't separable from:

Embodied sensing (10 billion sensors)
Evolutionary selection pressure (3.8 billion years)
Environmental context (real-time threats and opportunities)
Consequence awareness (survival depends on accuracy)

AI has none of these. It's pattern-matching without the debugging that makes pattern-matching reliable.

What Evolution Debugged

Threat Detection: False negatives killed you; humans evolved to be slightly paranoid
Uncertainty Tolerance: Overconfidence killed you; humans evolved appropriate caution
Contextual Sensing: Missing context killed you; humans evolved multi-modal integration
Rapid Assessment: Slow decisions killed you; humans evolved fast pattern recognition
Social Cues: Misreading others killed you (or prevented reproduction); humans evolved sophisticated empathy

Application to Medical AI

Don't assume AI can replicate evolved capabilities without similar selection pressure
Trust biological intuitions that survived millions of years of testing
Maternal instinct (threat detection) > algorithmic reassurance
Physician pattern recognition (refined through consequences) > statistical correlation
Evolution's false positive bias (caution) > AI's confidence bias

How We Apply These Frameworks

Every case in our archive is analyzed through all six lenses:

Velociraptor Test: What survival pressure would have prevented this failure?
10 Billion Sensors: What did the AI fail to sense that humans would detect?
Malpractice Insurance: Who faces consequences when this fails?
Intelligent Humility: Should the system have said "I don't know"?
Content Control: Would constraining knowledge sources have prevented this?
Evolution as Debugger: What evolutionary wisdom was ignored?

This isn't anti-AI critique. It's pro-patient-safety analysis grounded in biological and clinical reality.

We're not asking "can AI do this?" We're asking "should AI do this, and if so, how do we build systems that don't kill people?"

← Back to Medical AI Gone Wrong