Consulting Case Studies Need Statistical Validity


One of the books that's had the biggest impact on my work is Donald Wheeler's Understanding Variation: The Key to Managing Chaos. It's the best guide for applying simple and powerful “statistical process control” (or SPC) practices to management data and decision making.

A recently read a consulting case study, “Harris Methodist saves $648,695 through SIPOC process changes.”

That's a suspiciously precise number, but that's not my real beef.

The case study is a good one, highlighting how a Fort Worth TX hospital made improvements in the E.D. processes that led to shorter length-of-stay (improved patient flow and improved capacity).

The results claimed:

  • The total Triage cycle time, from patient arrival to bed placement, was reduced by 23 minutes.
  • The total ED-IP cycle time, from patient admit order to IP arrival, was reduced by 33 minutes.
  • The average LOS decreased from 97 to 61 minutes
  • The average patient satisfaction increased from 87.9 to 89%

One recent observation of mine is that consulting (or hospital) case studies should do more than report before/after. That's just two data points, and two data points don't make a trend, as they say.

A simple before/after comparison doesn't have a time scale. It begs the question of sustainability — kaizen events are notorious for having a quick burst of excitement and improvement… but then what happens?  “How do we sustain improvements?” is one of the most common questions in the Lean world. Virginia Mason once reported that they had backsliding in 60% of their week-long Rapid Process Improvement Workshops. That's not a good sustainment rate.

Updated: Reference on the Virginia Mason number comes from this article that was on the internet – Seeking Perfection in Health Care: Applying the Toyota Production System to Medicine – (to be fair, it's citing 2004 numbers, they have undoubtedly gotten better). They said:

During an assessment in late 2004  that reviewed and remeasured all  improvement efforts to date, we were  only holding the gains on about 40  percent of those changes, partially  because it is easy to slip back into old  ways of doing things if there is a lack  of accountability and follow-through.

Back to the main story:

One way that case study writers can show sustainment is to show a time series chart over time or, better yet, a control chart (as Dr. Wheeler demonstrates in his book).

In the case I've linked to, the final statement is:

The average patient satisfaction increased from 87.9 to 89%

Is that at all sustained or statistically significant? We don't know with just two data points.

Thankfully, in the linked case study, they give us a chart, a time series chart, as shown below, so cheers to them for not only giving the before and after.

Those of you that are familiar with control charts know that the change pictured above isn't really a statistical improvement. Ironically, they might have shown us the chart to try to bolster their case?

If Harris Methodist had a “stable system” before the change (which took place in November 2008??), then the next 12 data points show what appears to still be a stable system around the mean of 86.8%.

One of the SPC rules (the “Western Electric Rules“) to show a statistical shift is to have EIGHT consecutive data points above or below a mean. In the above chart, we only have FOUR. It's not statistically significant, it's not a process shift.

It's actually more true, statistically, to say there was no improvement. The last four data points could be “common cause” variation — a.k.a., noise or statistical chance. They may have declared victory prematurely, as the next month's data would just as likely be 85% as it would be 88% – meaning the process hasn't necessarily gotten better.

It would be nice if we could have some agreement and standards about how to represent data like this in case studies, but that's not likely to happen.

I'm not saying the hospital or the consultant didn't make things better. I'm just saying that the above chart doesn't, on its own, prove that case.

Some of this might seem sort of esoteric if you haven't read Wheeler's book. Go get the book, or you can read articles on his website. I tried to cover this topic a bit in my book, Lean Hospitals: Improving Quality, Patient Safety, and Employee Engagement, as well.

Have you been able to apply SPC principles to your own management work? Have you been able to use SPC to help gauge if you really had a statistically significant process shift? If this doesn't make sense, ask questions and I'll do my best to respond in comments.

In the near future, I'll share some data from a former client of mine showing three years of sustainment (actually a few positive process shifts, improvements that are statistically significant).

What do you think? Please scroll down (or click) to post a comment. Or please share the post with your thoughts on LinkedIn – and follow me or connect with me there.

Did you like this post? Make sure you don't miss a post or podcast — Subscribe to get notified about posts via email daily or weekly.

Check out my latest book, The Mistakes That Make Us: Cultivating a Culture of Learning and Innovation:

Get New Posts Sent To You

Select list(s):
Previous articleWeekend Fun: Funny Plane Picture That’s Not Really “Visual Management”
Next articleWatching Some Bad Online Lean Training, or “L.A.M.E.” Training
Mark Graban
Mark Graban is an internationally-recognized consultant, author, and professional speaker, and podcaster with experience in healthcare, manufacturing, and startups. Mark's new book is The Mistakes That Make Us: Cultivating a Culture of Learning and Innovation. He is also the author of Measures of Success: React Less, Lead Better, Improve More, the Shingo Award-winning books Lean Hospitals and Healthcare Kaizen, and the anthology Practicing Lean. Mark is also a Senior Advisor to the technology company KaiNexus.


    • How would you prove the difference is statistically significant? I didn’t do it in the example here (for one, there aren’t enough data points), but calculating control limits for the patient satisfaction chart would tell us if a single data point was significantly higher than the usual trend.

      • Actually if you look carefully at the graph on customer satisfaction, you can see some blue shaded area in the background, very faint. These appear to be the varying control limits for what appears to be a c-control chart. The control chart boundaries show this is just common cause variation and no process change.

  1. Wow, an ED LOS of 61 minutes! I don’t believe it.

    Consultants and hospitals frequently overstate the value of reducing LWBS. LWBSs leave because they aren’t all that sick and a recent canadian study shows that they have a much lower admission rate than the general ED popoulation. My observation is that they also are much more likely to have higher charity/bad debt incidence.

    Imagine that. A consultant overstating their results!

  2. Mark:
    Most of the data we have to work with from the shop floor are NOT experimental, and are not easily (legitamately) subjected to t-tests, F-tests, regression analysis for statistical purposes, etc.

    Looking at trends over time where lots of factors are uncontrolled (supervision, finance, the economy, how the Eagles did last night, etc.) it’s awfully hard to test the effect of the particular factor of interest (our deliberate changes/improvements).

    Our answer: dont’ crow about anything but long-term trends. We follow data at the most granular level (typically weekly) but also plot the 13 week rolling average and the 52 week rolling average. Together, these three paint a pretty clear picture, though cause and effect associations are always going to be dicey in the real world where so many influences are uncontrolled.

    And significant figures matter, too. When someone surveys people on a 5-point scale (employee satisfaction, customer satisfaction, e.g.) and then reports improvement in decimals, I cringe! Show me an improvement at least as big as the smallest increment of measure, please!

    • This is an great post. More than a decade ago I found SPC to be a fairly well accepted part of the lean toolkit; its application was therefore an important part of my research on applying lean within a complex industry (aerospace). The results showed that SPC offers real benefits (as described in this post), but that the greatest results come when a facility has first attained certain foundational capabilities–which helps address challenges like those Andrew pointed out. This was a major focus in my Shingo Prize winning book, “Breaking the Cost Barrier (Wiley, 2000).

      I’m looking forward to the subsequent posts on this that you indicated will be coming.

  3. Great post. This is a conversation I have quite frequently with people. A trend chart is one of the most powerful pictures sustainment. Without it, backsliding occurs more rapidly in my experience.

    Too many people look at two data points and not the whole picture. By understanding the whole picture you can see how steady the improvement has or has not been.

  4. Good points Mark. I see this too often. People misrepresent data or don’t know how to interprete it. This is why I like to see the data displayed graphically and not just the conclusion which is a before and after case. Why not consider the error. I am sure in the story you use there the real benefits of the consultants are not easily captured and probably take more time to fully understand. I think that measure itself is a difficult one to analyze since a lot can affect it.

    Thanks for sharing.

    • Tim – It’s often what some would call “statistical illiteracy” which is what I’d suspect in the case study I linked to. But sometimes people do try to snow you with selective metrics. If I wanted to “prove” that patient satisfaction got worse, I could just choose a different set of data points.

      Lies, damn lies, and statistics :-)

  5. Great blog Mark,

    I can recall having the same argument with management of a past employer. They were stressing the staff out to increase the response rate for client satisfaction from something like 77% to 85%, which I argued was an arbitrary value and would not likely result in any significant change in client ratings as any change would be within the historical range of values (roughly, 85-92% satisfaction which I would think was very acceptable). I guess they felt if they couldn’t create a stretch target to have higher satisfaction they could at least focus on the response rate.

    • Thanks, Ron. You bring up another important issue: process specifications (targets) versus process capability.

      You’re right to not want arbitrary targets and goals (like 85%) as that might cause a lot of dysfunctions, especially if people fear punishment for NOT hitting the target.

      A control chart tells us the process capability, which as Dr. Deming taught, merely allows us to predict the future. If the control limits for patient sat are 65% and 75% (average of 70%), the control chart tells us the next month is likely to be between 65% and 75% due to common cause variation (statistical chance, basically). It’s not capable of ever hitting 85% with making changes to the system.

      Trying to improve the system to have higher patient satisfaction is different than just setting a higher goal.

  6. Mark
    I think this points to another issue, related to the way metrics are used and viewed in healthcare. They improved their processes to reduce LOS, triage time and ED to inpatient arrival time, but these did not change patient satisfaction.(of course, I wonder exactly which patients they were surveying?)

    The hypothesis all along in ED healthcare is that reducing these metrics will result in the goal of improved patient satisfaction. They have nicely demonstrated that there must be something else that would make the difference.

    This data should start us looking beyond this sub-optimization for what really matters.

    Mark Jaben, MD

    • Great point, Mark – why didn’t patient satisfaction increase?

      I’d rather have safe, effective healthcare that people don’t have to wait for (outcomes, safety, L.O.S.). Patient satisfaction can come from irrelevant (to me) factors like a nice waiting room and plasma TVs to keep yourself busy while waiting for what might be unsafe care.

  7. Mark

    I think you are quite correct. But in the spirit of Kaizen, both should be achieved: safe, effective healthcare that people don’t have to wait for and satisfaction.

    If we judge how we are doing by an irrelevant, or not as relevant, metric , like patient satisfaction, then we are not identifying what really matters.

    Patient satisfaction is a goal. How we achieve it depends on our processes and these processes can be evaluated by metrics we choose. I am afraid to say treating patient satisfaction as a metric results in metrics being used to judge, rather than for improvement. I am also afraid to say this is what happens so frequently in healthcare.

    So this might seem like semantics, but I think this distinction is a crucial mental model necessary to overcome the reluctance to engage in Lean or, worse, the notion that Lean doesn’t work.

    Mark Jaben, MD

    • Yes, I’m not saying ignore patient satisfaction or that it doesn’t matter. I’m saying that it’s maybe a “nice to have,” since what good does a nice friendly environment do if the basics of quality, patient safety, or flow aren’t addressed first? I’m also suggesting that some organizations think they can basically “buy” patient satisfaction by spending money on a nice environment and papering over the underlying process problems.

  8. Mark, I recently read a study of the effects of training on Children’s Aid case workers. The company did a survey of worker atitudes about working with families and the level of engagement and ‘going to the gemba” to see what was actually happening in the homes. What a concept. The measures they used were typical social work type touchy feely empathy type measures. From my perspective from reading your blog, sustainment needs to be measured long term. I totally agree, however, when they did the training and measured the change in atitude directly after the training, they found measureable change in worker atitudes. From my perspective, and a scientific standpoint, the changes must be repeatable, measureable and have a standard by which to measure it. All of these things were not done in this case. For me it was all about proving the efficacy of training to promote change. All I can say is there is a pandemic of poor measures across many industries and most of the time the measures are self serving and mostly unreliable. Even cost savings are a short term measure that ignores the systemic nature of cost savings. Who knows if the costs don’t just show up some other place in the system or later on in time. I’m exploring the whole issue of training efficacy (sorry for using that word twice) for lean or training in general. Thanks for your attention to this issue.


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.