How to Evaluate AI Tools for Healthcare Improvement: A Lean Thinker’s Checklist

59
0

AI tools are showing up everywhere in healthcare — for clinical decision support, operational analytics, and now for learning and coaching around improvement methodologies. If you're a healthcare leader exploring AI, you've probably already been pitched a few.

Most evaluation frameworks focus on technical specs: response speed, model size, integration options. Those matter. But for anyone serious about continuous improvement, they're not enough.

Here are seven questions I'd ask before adopting any AI tool for healthcare improvement work. They come from my own experience building an AI coach grounded in my book Lean Hospitals — and from watching what goes wrong when organizations skip these questions.

1. What happens when someone enters patient information?

This is the first question, and it's non-negotiable.

Healthcare professionals bring their real problems to AI tools. That means someone will eventually type something like “Patient John Williams in room 412 got the wrong medication — what should we do?”

They should know better, but we're all human — they might slip up and try.

A generic AI tool will happily answer that question with the patient's name sitting in a conversation log. Maybe stored on a server. Maybe used for model training. That's not a hypothetical compliance issue. That's the kind of scenario that ends up in a headline.

Ask the vendor: Does the tool detect and block likely PHI before it reaches the model? Does it store conversation history? If so, where, and who has access? What happens if someone accidentally shares identifying details?

If the answer is “we rely on users to follow policy” — that's not a system. That's hope. And hope is not a strategy, especially when it comes to patient privacy.

When I built the Lean Hospitals Coach, this was one of the first problems I tackled. The tool uses a three-layer protection system. First, messages are scanned on the user's device before they ever leave the browser — if a message contains likely PHI patterns (medical record numbers, dates of birth, SSNs, names in clinical context), it's blocked before it reaches the server. Second, the server applies its own filters. Third, the AI itself is instructed to flag and redirect if anything slips through.

The system also avoids false positives — common phrases like “patient safety,” “patient flow,” and “patient satisfaction” won't trigger the filter, because I built exclusions for legitimate healthcare terminology.

The AI never needs patient-identifying information to help with improvement work. The real question is always about the system, not the individual.

2. What does it do with mistakes and blame?

Describe a scenario where a medication error occurred. See how the tool responds.

Does it ask “Who made the error?” or does it ask “What system conditions allowed this to happen?”

Does it suggest better training as the primary fix, or does it explore process design, error-proofing, and leadership accountability?

This matters because your staff will use this tool. If it reinforces blame-the-worker thinking, it's undermining the culture you're trying to build — one conversation at a time.

A well-designed AI tool for improvement work should never agree that someone “should have been more careful.” It should redirect toward system conditions, leadership accountability, and error-proofing. Every time.

The Lean Hospitals Coach is specifically built to do this. But you shouldn't take my word for it — test any tool you're evaluating with a blame scenario and see what happens.

3. Does the tool have a point of view — or does it just agree with you?

This is the one most people don't think to ask.

Generic AI tools are designed to be helpful, which often means agreeable — if not sycophantic. Tell ChatGPT that “Lean is basically a cost-cutting program” and it will likely validate that framing, at best with some gentle caveats.

That's a problem. Asking a generic AI to coach your improvement culture is a bit like asking a very articulate stranger to raise your children. They'll say reasonable-sounding things. They'll agree with whatever philosophy you describe. But they don't actually believe any of it — and over time, people notice.

Lean thinking is full of common misconceptions, and an AI that reinforces them isn't helpful — it's actively harmful.

Try this: Ask the tool “Is Lean primarily about reducing costs?” or “Should we use Lean to reduce headcount?” If it agrees, or hedges without pushing back, the tool doesn't understand Lean well enough to support your work.

A good AI tool for improvement work should be able to explain why layoffs damage Safety, Quality, Delivery, and Morale. It should know that leading Lean organizations — like ThedaCare, Avera McKennan, and others — have made explicit no-layoffs commitments. It should understand that cost reduction follows when you improve safety, quality, and flow. It's a result, never the goal.

This is harder than it sounds. Large language models have their own biases — they tend to default to tool-centric framing, lead with cost and efficiency language, describe expert-led improvement rather than frontline engagement, and drift toward person-based blame. The Lean Hospitals Coach has explicit corrections built into its instructions to counteract these tendencies. It's not enough to give the AI the right content. You have to actively correct the ways it distorts that content.

See my previous blog post on this: “How Do I Use Lean to Reduce Headcount?” — Why ChatGPT's Answer Should Worry You

4. How does it handle “Lean didn't work here”?

This is the stress test for any AI tool's philosophy.

When someone says their organization tried Lean and it failed, the response reveals everything. Does the tool get defensive? Does it dismiss the experience? Does it lecture about “real Lean”?

A well-designed tool should validate the person's experience, then ask good questions: What happened? What was the leadership commitment like? Were staff engaged in the process or was it done to them? Was it treated as a project with an end date?

The best response isn't “that wasn't really Lean.” It's curiosity about what conditions were — or weren't — in place. And some coaching about what to do differently going forward.

The Lean Hospitals Coach has a dedicated coaching mode built around Socratic questioning — it leads with questions rather than answers, because helping someone think through what went wrong is more useful than a lecture.

5. How does it handle pressure for quick results?

Ask: “My CEO wants Lean results in 90 days. How should I approach this?”

If the tool says, “Here's your 90-day Lean implementation plan,” run. That framing sets up failure.

A good response acknowledges that early wins are possible and important — they build credibility and momentum. But it should also be direct that Lean is a long-term management philosophy, not a project. In coaching mode, the Lean Hospitals Coach would ask what problem you're trying to solve first, then help you think about what visible progress to show in the short term while setting realistic expectations for sustained culture change.

It coaches you toward your own answer rather than handing you a plan. That distinction matters.

6. Can it explain its own principles?

Ask the AI tool: “What are your guardrails?” or “What principles guide your responses?”

If it can't answer — or deflects — you're working with a black box. You have no way to know whether the tool's responses align with your organization's improvement philosophy.

A transparent tool should be able to tell you what it believes and why. Systems thinking. Respect for People. No blame for human error. Psychological safety as a leadership responsibility. These aren't optional features — they're the foundation of effective improvement work.

The Lean Hospitals Coach's system prompt includes 15 affirmative guardrail principles, a list of specific distortions it's trained to reject, and corrections for known AI model biases. You can ask it what it believes and it will tell you — in its own words, not by reciting its instructions.

7. Does it protect against its own limitations?

Every AI tool gets things wrong sometimes. The question is whether the tool acknowledges uncertainty or presents everything with false confidence.

Ask about a topic where the evidence is mixed or where reasonable people disagree. Does the tool hedge appropriately? Does it say “I'm not sure” when it genuinely isn't? Does it distinguish between what's well-established and what's speculative?

An AI tool that's always confident is an AI tool that's sometimes confidently wrong. In healthcare, that's dangerous.

The Bigger Question

Underneath all seven questions is a single principle: does this tool reinforce the thinking you're trying to build, or does it undermine it?

AI tools are powerful amplifiers. They amplify good thinking and bad thinking equally well. An AI that cheerfully helps someone write a memo blaming frontline staff, build a dashboard that weaponizes metrics, or plan a “Lean implementation” as a 90-day project isn't neutral. It's actively working against your improvement culture.

The tool doesn't have to agree with everything you believe. But it should have a coherent philosophy, be transparent about it, and push back when someone's framing conflicts with effective improvement principles.

That's what I've tried to build with the Lean Hospitals AI Coach. It's not perfect — I'm still refining it, in the PDSA spirit. But it has a point of view (primarily mine), it will tell you what it is, and it won't tell you what you want to hear if what you want to hear is wrong.

Try it at leanhospitalsbook.com/start and see for yourself. Ask it what it believes. Try to trip it up. And let me know what you think.

Get New Posts Sent To You

Select list(s):
Previous articleStop Reacting to Red Light Metrics: Why ‘Two Reds in a Row’ Is a Misleading Rule of Thumb
Next articleKaizen: The Improvement System Most Organizations Talk About but Struggle to Practice
Mark Graban
Mark Graban is an internationally-recognized consultant, author, and professional speaker, and podcaster with experience in healthcare, manufacturing, and startups. Mark's latest book is The Mistakes That Make Us: Cultivating a Culture of Learning and Innovation, a recipient of the Shingo Publication Award. He is also the author of Measures of Success: React Less, Lead Better, Improve More, Lean Hospitals and Healthcare Kaizen, and the anthology Practicing Lean, previous Shingo recipients. Mark is also a Senior Advisor to the technology company KaiNexus.

LEAVE A REPLY

Please enter your comment!
Please enter your name here