Blog

Consulting Case Studies Need Statistical Validity

October 4, 2010 Modified date: April 27, 2013

One of the books that's had the biggest impact on my work is Donald Wheeler's Understanding Variation: The Key to Managing Chaos. It's the best guide for applying simple and powerful “statistical process control” (or SPC) practices to management data and decision making.

A recently read a consulting case study, “Harris Methodist saves $648,695 through SIPOC process changes.”

That's a suspiciously precise number, but that's not my real beef.

The case study is a good one, highlighting how a Fort Worth TX hospital made improvements in the E.D. processes that led to shorter length-of-stay (improved patient flow and improved capacity).

The results claimed:

The total Triage cycle time, from patient arrival to bed placement, was reduced by 23 minutes.

The total ED-IP cycle time, from patient admit order to IP arrival, was reduced by 33 minutes.

The average LOS decreased from 97 to 61 minutes

The average patient satisfaction increased from 87.9 to 89%

One recent observation of mine is that consulting (or hospital) case studies should do more than report before/after. That's just two data points, and two data points don't make a trend, as they say.

A simple before/after comparison doesn't have a time scale. It begs the question of sustainability — kaizen events are notorious for having a quick burst of excitement and improvement… but then what happens? “How do we sustain improvements?” is one of the most common questions in the Lean world. Virginia Mason once reported that they had backsliding in 60% of their week-long Rapid Process Improvement Workshops. That's not a good sustainment rate.

Updated: Reference on the Virginia Mason number comes from this article that was on the internet – Seeking Perfection in Health Care: Applying the Toyota Production System to Medicine – (to be fair, it's citing 2004 numbers, they have undoubtedly gotten better). They said:

During an assessment in late 2004 that reviewed and remeasured all improvement efforts to date, we were only holding the gains on about 40 percent of those changes, partially because it is easy to slip back into old ways of doing things if there is a lack of accountability and follow-through.

Back to the main story:

One way that case study writers can show sustainment is to show a time series chart over time or, better yet, a control chart (as Dr. Wheeler demonstrates in his book).

In the case I've linked to, the final statement is:

The average patient satisfaction increased from 87.9 to 89%

Is that at all sustained or statistically significant? We don't know with just two data points.

Thankfully, in the linked case study, they give us a chart, a time series chart, as shown below, so cheers to them for not only giving the before and after.

Those of you that are familiar with control charts know that the change pictured above isn't really a statistical improvement. Ironically, they might have shown us the chart to try to bolster their case?

If Harris Methodist had a “stable system” before the change (which took place in November 2008??), then the next 12 data points show what appears to still be a stable system around the mean of 86.8%.

One of the SPC rules (the “Western Electric Rules“) to show a statistical shift is to have EIGHT consecutive data points above or below a mean. In the above chart, we only have FOUR. It's not statistically significant, it's not a process shift.

It's actually more true, statistically, to say there was no improvement. The last four data points could be “common cause” variation — a.k.a., noise or statistical chance. They may have declared victory prematurely, as the next month's data would just as likely be 85% as it would be 88% – meaning the process hasn't necessarily gotten better.

It would be nice if we could have some agreement and standards about how to represent data like this in case studies, but that's not likely to happen.

I'm not saying the hospital or the consultant didn't make things better. I'm just saying that the above chart doesn't, on its own, prove that case.

Some of this might seem sort of esoteric if you haven't read Wheeler's book. Go get the book, or you can read articles on his website. I tried to cover this topic a bit in my book, Lean Hospitals: Improving Quality, Patient Safety, and Employee Engagement, as well.

Have you been able to apply SPC principles to your own management work? Have you been able to use SPC to help gauge if you really had a statistically significant process shift? If this doesn't make sense, ask questions and I'll do my best to respond in comments.

In the near future, I'll share some data from a former client of mine showing three years of sustainment (actually a few positive process shifts, improvements that are statistically significant).

23 COMMENTS

mp October 4, 2010 At 5:25 am

I really like the box & whisker plot for this kind of comparison:

http://www.statcan.gc.ca/edu/power-pouvoir/ch12/5214889-eng.htm

Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared.

Reply
- Mark Graban October 4, 2010 At 9:14 am
  
  How would you prove the difference is statistically significant? I didn’t do it in the example here (for one, there aren’t enough data points), but calculating control limits for the patient satisfaction chart would tell us if a single data point was significantly higher than the usual trend.
  
  Reply
  - Dale Roenigk November 8, 2016 At 3:19 pm
    
    Actually if you look carefully at the graph on customer satisfaction, you can see some blue shaded area in the background, very faint. These appear to be the varying control limits for what appears to be a c-control chart. The control chart boundaries show this is just common cause variation and no process change.
    
    Reply
Anonymous October 4, 2010 At 10:23 am

Wow, an ED LOS of 61 minutes! I don’t believe it.

Consultants and hospitals frequently overstate the value of reducing LWBS. LWBSs leave because they aren’t all that sick and a recent canadian study shows that they have a much lower admission rate than the general ED popoulation. My observation is that they also are much more likely to have higher charity/bad debt incidence.

Imagine that. A consultant overstating their results!

Reply
Andrew Bishop October 4, 2010 At 10:24 am

Mark:
Most of the data we have to work with from the shop floor are NOT experimental, and are not easily (legitamately) subjected to t-tests, F-tests, regression analysis for statistical purposes, etc.

Looking at trends over time where lots of factors are uncontrolled (supervision, finance, the economy, how the Eagles did last night, etc.) it’s awfully hard to test the effect of the particular factor of interest (our deliberate changes/improvements).

Our answer: dont’ crow about anything but long-term trends. We follow data at the most granular level (typically weekly) but also plot the 13 week rolling average and the 52 week rolling average. Together, these three paint a pretty clear picture, though cause and effect associations are always going to be dicey in the real world where so many influences are uncontrolled.

And significant figures matter, too. When someone surveys people on a 5-point scale (employee satisfaction, customer satisfaction, e.g.) and then reports improvement in decimals, I cringe! Show me an improvement at least as big as the smallest increment of measure, please!

Reply
- Stephen Ruffa October 4, 2010 At 8:05 pm
  
  This is an great post. More than a decade ago I found SPC to be a fairly well accepted part of the lean toolkit; its application was therefore an important part of my research on applying lean within a complex industry (aerospace). The results showed that SPC offers real benefits (as described in this post), but that the greatest results come when a facility has first attained certain foundational capabilities–which helps address challenges like those Andrew pointed out. This was a major focus in my Shingo Prize winning book, “Breaking the Cost Barrier (Wiley, 2000).
  
  I’m looking forward to the subsequent posts on this that you indicated will be coming.
  
  Reply
Matt Wrye October 4, 2010 At 10:28 am

Great post. This is a conversation I have quite frequently with people. A trend chart is one of the most powerful pictures sustainment. Without it, backsliding occurs more rapidly in my experience.

Too many people look at two data points and not the whole picture. By understanding the whole picture you can see how steady the improvement has or has not been.

Reply
Tim McMahon October 4, 2010 At 10:34 am

Good points Mark. I see this too often. People misrepresent data or don’t know how to interprete it. This is why I like to see the data displayed graphically and not just the conclusion which is a before and after case. Why not consider the error. I am sure in the story you use there the real benefits of the consultants are not easily captured and probably take more time to fully understand. I think that measure itself is a difficult one to analyze since a lot can affect it.

Thanks for sharing.

Reply
- Mark Graban October 4, 2010 At 6:11 pm
  
  Tim – It’s often what some would call “statistical illiteracy” which is what I’d suspect in the case study I linked to. But sometimes people do try to snow you with selective metrics. If I wanted to “prove” that patient satisfaction got worse, I could just choose a different set of data points.
  
  Lies, damn lies, and statistics :-)
  
  Reply
Ron Krawec October 4, 2010 At 5:56 pm

Great blog Mark,

I can recall having the same argument with management of a past employer. They were stressing the staff out to increase the response rate for client satisfaction from something like 77% to 85%, which I argued was an arbitrary value and would not likely result in any significant change in client ratings as any change would be within the historical range of values (roughly, 85-92% satisfaction which I would think was very acceptable). I guess they felt if they couldn’t create a stretch target to have higher satisfaction they could at least focus on the response rate.

Reply
- Mark Graban October 4, 2010 At 6:07 pm
  
  Thanks, Ron. You bring up another important issue: process specifications (targets) versus process capability.
  
  You’re right to not want arbitrary targets and goals (like 85%) as that might cause a lot of dysfunctions, especially if people fear punishment for NOT hitting the target.
  
  A control chart tells us the process capability, which as Dr. Deming taught, merely allows us to predict the future. If the control limits for patient sat are 65% and 75% (average of 70%), the control chart tells us the next month is likely to be between 65% and 75% due to common cause variation (statistical chance, basically). It’s not capable of ever hitting 85% with making changes to the system.
  
  Trying to improve the system to have higher patient satisfaction is different than just setting a higher goal.
  
  Reply
Mark Graban October 4, 2010 At 8:34 pm

Tripp Babbitt apparently thinks he has something useful to add:

https://www.leanblog.org/2010/10/the-one-where-john-seddon-lies-or-has-his-facts-very-very-wrong/#comment-13128

Your attempt at explaining data in your recent blog post is laughable. You might want to spend some time in Knoxville with Dr. Wheeler and actually learn about what you are writing on.

Reply
Mark Jaben October 4, 2010 At 11:58 pm

Mark
I think this points to another issue, related to the way metrics are used and viewed in healthcare. They improved their processes to reduce LOS, triage time and ED to inpatient arrival time, but these did not change patient satisfaction.(of course, I wonder exactly which patients they were surveying?)

The hypothesis all along in ED healthcare is that reducing these metrics will result in the goal of improved patient satisfaction. They have nicely demonstrated that there must be something else that would make the difference.

This data should start us looking beyond this sub-optimization for what really matters.

Mark Jaben, MD

Reply
- Mark Graban October 5, 2010 At 8:55 am
  
  Great point, Mark – why didn’t patient satisfaction increase?
  
  I’d rather have safe, effective healthcare that people don’t have to wait for (outcomes, safety, L.O.S.). Patient satisfaction can come from irrelevant (to me) factors like a nice waiting room and plasma TVs to keep yourself busy while waiting for what might be unsafe care.
  
  Reply
Mark Jaben October 5, 2010 At 10:46 am

Mark

I think you are quite correct. But in the spirit of Kaizen, both should be achieved: safe, effective healthcare that people don’t have to wait for and satisfaction.

If we judge how we are doing by an irrelevant, or not as relevant, metric , like patient satisfaction, then we are not identifying what really matters.

Patient satisfaction is a goal. How we achieve it depends on our processes and these processes can be evaluated by metrics we choose. I am afraid to say treating patient satisfaction as a metric results in metrics being used to judge, rather than for improvement. I am also afraid to say this is what happens so frequently in healthcare.

So this might seem like semantics, but I think this distinction is a crucial mental model necessary to overcome the reluctance to engage in Lean or, worse, the notion that Lean doesn’t work.

Mark Jaben, MD

Reply
- Mark Graban October 5, 2010 At 11:42 am
  
  Yes, I’m not saying ignore patient satisfaction or that it doesn’t matter. I’m saying that it’s maybe a “nice to have,” since what good does a nice friendly environment do if the basics of quality, patient safety, or flow aren’t addressed first? I’m also suggesting that some organizations think they can basically “buy” patient satisfaction by spending money on a nice environment and papering over the underlying process problems.
  
  Reply
Paul Hager October 6, 2010 At 12:12 pm

Mark, I recently read a study of the effects of training on Children’s Aid case workers. The company did a survey of worker atitudes about working with families and the level of engagement and ‘going to the gemba” to see what was actually happening in the homes. What a concept. The measures they used were typical social work type touchy feely empathy type measures. From my perspective from reading your blog, sustainment needs to be measured long term. I totally agree, however, when they did the training and measured the change in atitude directly after the training, they found measureable change in worker atitudes. From my perspective, and a scientific standpoint, the changes must be repeatable, measureable and have a standard by which to measure it. All of these things were not done in this case. For me it was all about proving the efficacy of training to promote change. All I can say is there is a pandemic of poor measures across many industries and most of the time the measures are self serving and mostly unreliable. Even cost savings are a short term measure that ignores the systemic nature of cost savings. Who knows if the costs don’t just show up some other place in the system or later on in time. I’m exploring the whole issue of training efficacy (sorry for using that word twice) for lean or training in general. Thanks for your attention to this issue.

Reply
- Mark Graban October 6, 2010 At 12:43 pm
  
  Thanks for sharing that. One reason we should be careful about only measuring the short-term is the old Hawthorne Effect, as written about here:
  
  http://germguy.wordpress.com/2010/09/13/better-hand-hygiene-or-better-big-brother/
  
  Reply
Avoid Charting Performance Measures in Confusing or Misleading Ways â€” Lean Blog November 3, 2010 At 6:47 am

[…] month, I wrote a post titled “Consulting Case Studies Need Statistical Validity” where I discussed the need to show more than simple “before and after” numbers […]

Reply
Tampa Bay Business Journal Features Lean at Moffitt Cancer Center â€” Lean Blog November 16, 2010 At 5:02 am

[…] Some other improvement data (with the caveat that I wish we got more than just simple before-and-after comparisons, as I blogged about here). […]

Reply
Mini-Review: First Three Chapters of the Virginia Mason Medical Center Lean Book â€” Lean Blog November 30, 2010 At 5:01 am

[…] It’s light on results data right now, but that appears to be coming later in the book (and no control charts yet). […]

Reply
Data Without Context Isn’t Very Helpful; Don’t Overreact to Each Up & Down â€” Lean Blog November 2, 2011 At 8:29 am

[…] would be much more helpful to plot this quality panel in a run chart or a proper SPC chart (as talked about in this post). Having some context about how this score compares to other hospitals would help (although our […]

Reply
Shorter MD Intern Work Hours – Quality Has Gotten Worse (Unless it Hasn’t) â€” Lean Blog April 3, 2013 At 5:00 am

[…] The study, as broad as it was, was a simple before and after study. That means TWO data points. Two data points do not make a trend (as I’ve written about before, in the context of consulting firm before and after studies). […]

Reply

Consulting Case Studies Need Statistical Validity

Get New Posts Sent To You

23 COMMENTS

LEAVE A REPLY Cancel reply

EVEN MORE NEWS

Anthropic’s Claude Code Leak: Why the Instinct to Fire Someone Is...

Ryan McCormack’s Operational Excellence Mixtape: April 3, 2026

Lean Daily Management: What It Is and How to Get Started

POPULAR CATEGORY

Get New Posts Sent To You

RELATED ARTICLESMORE FROM AUTHOR

Lean Daily Management: What It Is and How to Get Started

The ER Wait Time Metric That Can Hide Deteriorating Patient Experience

Lean Healthcare Study Tour in Japan: September 2026

23 COMMENTS

LEAVE A REPLY Cancel reply

EVEN MORE NEWS

Anthropic’s Claude Code Leak: Why the Instinct to Fire Someone Is...

Ryan McCormack’s Operational Excellence Mixtape: April 3, 2026

Lean Daily Management: What It Is and How to Get Started

POPULAR CATEGORY

RELATED ARTICLES MORE FROM AUTHOR