There's an expression that was used by Dr. W. Edwards Deming, and also by Dr. Don Wheeler, that says managing through metrics (and comparisons to targets) is like driving by looking in the rear-view mirror.
Taking action on the basis of results without theory of knowledge, without theory of variation, without knowledge about a system. Anything goes wrong, do something about it, overreacting; acting without knowledge, the effect is to make things worse. With the best of intentions and best efforts, managing by results is, in effect, exactly the same, as Dr. Myron Tribus put it, while driving your automobile, keeping your eye on the rear view mirror, what would happen? And that's what management by results is, keeping your eye on results.
Wheeler using this imagery and quoted Tribus as well in his great (GREAT!) book Understanding Variation (see my thoughts on how influential it has been for me):
Monthly reports, weekly reports, daily metrics — they can all drive the same dysfunctions if they are managed using simplistic “red/green” reporting.
Let's look at a scenario where we have a weekly “satisfaction” score (could be patients, employees, customers… ignore where the data comes from – it's just there).
Management might set a target of 80. What is this based on?
- It could be arbitrary (which is bad)
- It could be based on a benchmark or an industry percentile (which might not be helpful)
- It could be based on last year's number but a little better (let's say last year's average was 78)
Managing by Red/Green Comparisons
How do people manage with these comparisons? Let's look at a very common and realistic scenario that would play out over time.
In this red/green approach, we compare each data point to the target. Again, this target might be very arbitrary.
This weekly score might be part of a “dashboard” that has many other measures on it. It might have too many different measures, which means the car “dashboard” analogy quickly breaks down, but that's another blog post.
The score is higher than our target. Management congratulations the team. “Way to go!” They might write that on the dashboard. Management pats themselves on the back and says, “Yeaaaaaa, we are skilled managers. We set a target and that inspirted people. Hurray!”
Life goes on.
The next week, the dashboard comes in:
Oh no! Below the target. Management says, “You all have to try harder. We know you can get an 82. Look, you did it last week.” The managers say privately, “Ah, see, people slack off when you give them too much praise. We'll remember that.”
Life goes on.
The next week, the dashboard looks like this:
Management says, “Oh no…. we were too hard on them last week.” But they still rally the team to say, “You have to do better! Give us your improvement ideas!”
Life goes on. No improvement ideas come forward (which is management's fault, if anybody's, by the way).
The week weeks come in:
What gives? Last year, they averaged 78. We set a goal at 80. Now, they can't even be above average!! The threats or the promises of rewards might get stronger.
Life goes on. Management has not asked for improvement ideas because “we already did that.”
Red again. Management says, “We need to put that supervisor on a performance improvement plan, because they clearly aren't performing well.”
Fear & Pleasing the Boss
It's important to take a time out to think about what happens when people are pressured to hit a target.
Brian Joiner wrote, in his outstanding book Fourth Generation Management: The New Business Consciousness, that there are three things that can happen when you have a quota or a target imposed upon people:
- Distort the system
- Distort the numbers
- Improve the system
See examples on my blog about “gaming the numbers.”
Without a method for improvement, pressure hit the target and to be “green” will drive people to distort the system or the numbers. It be negative pressure of “hit the target or else!” or positive pressure of “hit the target and we'll give you a reward.” Either way, it can get very dysfunctional very quickly.
Managers might get creative about making sure certain patients or staff don't get the survey to complete. Or, they might do things that artificially encourage people to give good rankings. I call that “begging for scores” instead of improving the scores. Our focus needs to be on improvement, not just the score. We need to actually improve the system, not distort it.
Back to the Red/Green Comparisons
Life goes on.
GREEN! Glorious green!
Management tells the team, “We knew you could do it! It just too more attention and hard work!” They say to themselves, privately, “Good, that PIP got her attention. Remember that.”
Life goes on.
The next week's scoreboard goes up:
RED. Blasted red. “We thought things were getting better!”
OK, this is exhausting. And for go0d reason.
Driving this way is like driving with blinders on. You're looking at each data point in ISOLATION. You're simplistically comparing it to the Target in a “good” or “bad” way. Wheeler calls this a “binary world view” – good or bad. Life is more complicated than that.
Data Need Context – Run Charts Help!
Instead of just looking at a data point and the target, a simple “run chart” can help. As an aside, this should be a LINE chart, not a bar/column chart, for a number of reasons).
What is the run chart of these nine weeks of data?
Hmmm, now we can look for trends. There's ups and downs in the data. There's what looks like “noise.” Is there any signal?
Note I didn't draw the target line on the graph. That target might be pretty arbitrary. We already know that we're generally not meeting the target, for what that's worth.
What's more helpful is putting the “mean” or average on the chart.
You might not be surprised that three data points are above the mean, three are right about on it, and three are below the mean.
The mean for those first nine data points is 76.43. Slightly below last year's average.
Most managers would ask, “How does the data compare to the target?”
Deming and Wheeler would ask us to ask, “Does the chart suggest that the process is in control?”
To know that, we create a “control chart” as a “Statistical Process Control” (SPC) method. Since you generally want to use 20 data points to create the “control limits” on the chart, I've moved time forward through 20 weeks of results (the mean is now 77.5).
The red/green minded manager might notice that only six of the 20 points are above the target. Slackers!
Here is what the control chart looks like, with the “3 sigma” Upper Control Limit (UCL) and Lower Control Limit (LCL) – sparing the details of how those limits are calculated.
This is what we'd call a stable system – it's “in control.”
In fact, the data for the control chart was created by Excel's random number generator, using a normal distribution with an average of 78.
All of those data points were created by the same system. The variation from point to point is statistical noise in the system. It's statistically possible to have six data points above the mean and still have it be part of the same system. Red/Green managers might think there is improvement (even though only weeks 16 and 17 were “green.” It's more than likely that Week 21 would be below average.
You might wonder what the relationship is between a random number generator and a real workplace would be. Well, if we have a stable workplace system – the same staff, the same procedures, the same workloads, the same physical space, etc. then we'd expect similar outcomes each week in the satisfaction score. Some weeks will be higher than others. Some will be above average and below average. That's just noise in the data. We shouldn't overreact to each data point and SPC shows us how to avoid doing so.
Being “in control” allows us to predict the following with pretty good certainty:
- Any given week's satisfaction score is going to be between 65 and 90.
- The current system is incapable of delivering scores of 80 or above each week.
Manage the System, Not Each Data Point
Deming and Wheeler teach us to focus on the process that generates these stable results. If we want to increase satisfaction scores, we have to reduce the sources of “common cause” variation. What are the things that patients or staff frequently complain about? If we eliminate some of those problems, we'll increase the average score and probably reduce variation in the score.
Without improvement, the control chart might continue like this, where everything is, of course, inside those control limits.
If we have a single week where the score is below 65, we should ask “What happened?” By this, we can finally ask “What happened that week? What was different? What went wrong?” Rather than blaming individuals, we need to look at the system. Those questions need to be followed up with “What can we do to prevent that problem from occurring again?”
Likewise, if we ever had a single week with a score of 90 or greater, we should ask, “What happened?” Just as it's statistically unlikely to have a random score below 65, it's unlikely to have a score above 90 unless something changed. Maybe staff were experimenting with some new approaches. We want to make sure those become the new standard practice and, hopefully, see sustained improvement.
As Regis Philbin (or Dana Carvey impersonating him) might say — this chart “is out of control!” There is most likely a “special cause” to be found.
Asking people to explain each up and down in a stable process is a waste of time. There's no special cause occurring any particularly week in a stable system. It's all noise, it's “common cause” variation. We can't ask, “What happened last week?” We instead need to ask, “What's happening each week that leads our performance to be lower than we want it to be?”
As we look at a control chart over time, there are different “Western Electric Rules” that tell us when the system is no longer stable and in control. One of those rules is to look for nine consecutive points above the mean. That's a statistically significant signal that the system has changed. Not six points in a row above the mean (remember, that can occur randomly). It's unlikely to have nine consecutive points above the line occur through chance.
Here's what a process shift like that looks like:
That's a stable system at a new level of performance. The mean is higher and there's slightly less variation (I reduced the standard deviation in the random number generator from 4 to 3.5).
The lower control limit of 75 tells us we might occasionally have a week under 80. But, again, who cares about an arbitrary target? What matters is the improvement. That's what should be rewarded, not hitting some target.
What Are Your Thoughts?
Have you used methods like this in your organization? Can you relate to the problems caused by red/green comparisons and being managed by pressure? Please leave a comment and share your story. Would you like to learn more about this? I've been talking with Mike Stoecklein about co-authoring an eBook on this topic that would include how to examples for creating control charts and using them.