[Updated] Reader Question: Why are There so Many Data Points Outside of the Limits?
I got a message from a reader of my book Measures of Success: React Less, Lead Better, Improve More.
The questions were:
“I entered data from Press Ganey patient satisfaction responses (12-month running totals) and since the numbers were so close to each other, the PBC showed a bunch of data points outside the upper/lower process limits.
When I changed the limits to +- 3 sigma, this issue disappeared. Have you run into this issue before? Can you help me to understand the difference and its impact on analysis using the PBC?”
Edit: “12-month running totals” means 12-month moving averages, as I later clarified. I'd rather plot each month's actual data and then use a Process Behavior Chart to filter out “noise” in the data. I've made some up some data to create a situation where we have actual monthly data that creates the same moving averages as shown below.
Run Chart of the Moving Average
A simple run chart, which looks like this, is made from the moving-average patient satisfaction data. There is a pronounced downward trend (although the Y-axis scale goes only from 91 to 94, so please don't be misled by that):
A falling moving-average implies that patient satisfaction is getting worse… but is it getting significantly worse or not? This requires a lot of mental gymnastics, which is why I think it's better to just plot each month's data point.
Process Behavior Chart of the Moving Average
Mathematically, we can do this, but I don't think it's a good idea.
His initial Process Behavior Chart for the moving average numbers looked like this:
If you know the PBC methodology, that chart looks troubling, right? This is what concerned the reader.
But, again, maybe we shouldn't be plotting the moving average numbers… he was plotting what was given to him, but I suggested he push back and try to get actual monthly numbers.
Plotting the Actual Data
Since the reader gave me 25 data points of the “moving average” number (25 months of moving averages), I had to create dummy data for 11 months prior… which led to a total of 36 data points.
I created monthly numbers in a way that led to the moving average numbers being the same as what the reader provided. Periods -10 through Period 0 are made up data for the purposes of illustration.
|Date||Monthly #||12-Month Moving Average|
A run chart of the actual monthly data (not the moving average) might look like the chart below. I've had Excel create a 12-month moving average (shown in red) so you can check my work (or the spreadsheet is here).
The moving average line smooths out variation, but it makes it look things were getting worse because the moving average was declining for a while before flattening out… but the actual data might just be fluctuating. I wonder how many managers would react to that moving average “trend” and demand an explanation?
Now, we need a Process Behavior Chart to tell us if this is all routine variation or if there's any signals to be found.
A Process Behavior Chart for the Monthly Data
We can use the first 25 data points as a baseline for calculating the average and the “lower and upper natural process limits” to see the hypothesis or question of “is this a metric that's fluctuating around the average in a predictable way?”
The formulas for the lower and upper limits are
= AVERAGE +/- 3 * MR-bar / 1.128
The MR-bar is the average of the “moving ranges” or the absolute values between each of the consecutive data points, as shown partially below:
It's 3 Sigma, not 2.66 Sigma
The formula for the limits is an approximation of +/- 3 sigma… but we're using the MR-bar and the statistical constant of 1.128 to estimate this instead of using a calculated standard deviation.
The formula for the limits can be simplified as:
= AVERAGE +/- 2.66 * MR-bar
And that sometimes gets confused with being “2.66 sigma” which is not the case. So, to the second part of the question (which was based on his Process Behavior Chart of the moving average):
“When I changed the limits to +- 3 sigma, this issue [with the moving range PBC] disappeared.”
I wrote back to the reader and explained that the limits were already +/- 3 sigma, so no adjustment was necessary (nor would that be appropriate).
Back to the Monthly Process Behavior Chart
So, let's look at the PBC (X Chart and MR Chart) that's based on the baseline of the 25 monthly data points:
This is clearly not a single predictable system over time… using our three main rules for finding signals, there are signals galore:
Let's mark those signals (all Rule 1 signals below the ldower limit):
I don't see any Rule 2 or Rule 3 signals.
We have three months in which the patient satisfaction score was below the lower limit. Other than those three months, it looks like a “predictable system” that's just fluctuating around an average of 93.
We could ask, “Why was patient satisfaction lower those three months?” The organization should find “special causes” for those data points (or it's the same special cause each month, perhaps).
We shouldn't ask for explanations of other individual months that are not signals.
The organization still might want to form a hypothesis around “how can we improve patient satisfaction?” Notice that's a different question than “how can we improve the scores?”
We'd want to improve actual satisfaction (although our measure is, at best, an estimate of that) and not just distort the numbers (for example, by starting a campaign to ask patients to give better scores whether they are deserved or not).
So, in conclusion, the Process Behavior Chart will help us characterize a system as being:
- Predictable or
- Not predictable
And we can do so in a somewhat binary way. When we have a predictable system, it's not a good use of time to ask for or look for an explanation to every up and down in the metric. Don't react to noise… DO react to signals and DO work to improve the system… react less, lead better, and improve more.
Again, Plot the Actual Numbers
I think another conclusion is, when in doubt, plot the actual numbers, not the moving average. The reader said that Press-Ganey provided the moving average number because the number of surveys is small.
I'd still rather see the actual numbers. If there's a lot of noise because of the small sample sizes, the chart will adjust for that by showing wider limits. We can use the PBC to distinguish between signal and noise.
The reader pointed out, in an additional exchange, that plotting the moving average means that the effect of any change we make NOW on patient satisfaction will take a while to show in the moving average chart. That's another argument for plotting the actual numbers.