Two Data Points Don’t Make a Trend, So Read Beyond the Headline
I've written a lot about the idea that two data points don't make a trend.
But, many organizations and leaders fall into the trap of just comparing a data point against the last time period or comparing it to a year before. The news media does a lot of these overly simplistic comparisons, as I recently saw in this headline about San Antonio, a city that used to be home for three years, a city that I really love (and there's more to love than the Riverwalk:
The article includes a “cause and effect” claim from the city:
“The drop is credited to a multi-agency task force created to tackle violent crime.”
When analyzing data for improvement, we need to have some knowledge about the system that generates data and results. It's not enough to say that a metric dropped. If it's in the range of “common cause variation,” it could just be a fluctuation that doesn't really have a specific cause.
It's possible that either:
- This task force led to a significant drop in the homicide rate
- This task force was just there and the homicide rate dropped because the rate fluctuates and this drop is within that range of “common cause” variation
Having just two data points makes it impossible to know if the drop in 2017 was in the range of common cause variation or if it was a statistically-significant signal that implied something changed.
The article text provides a bit more data and context (an important reason to read beyond the headline):
“…2016 was one of San Antonio's most violent years with nearly 150 homicides. That number dropped in 2017 by about 16 percent to 125.”
This implies that 150 was higher than usual. How much so?
We still don't know if a 16% drop is in the range of typical common cause variation or if it's meaningful. How much did the homicide rate jump in 2016?
The article also tells us:
“After an average of 90 homicides annually between 2011 and 2015…”
So 2016 and 2017 were both “above average” from the baseline years of 2011 to 2015. That's still not quite enough information to draw a “process behavior chart” that would help answer the question about whether or not 2016 or 2017 are “signals” or “noise” in the data. We would need the data points for each year from 2011 to 2015 (or, better yet, more data) to be able to calculate the upper and lower process behavior limits for the chart.
It's possible that 2016 and 2017 are noise… it's possible that 2016 was a signal.
I went online to try to find the actual data.
This headline doesn't answer the question about signal vs. noise either:
The fact that 2016 was a “21-year high” doesn't automatically mean that it's a signal or that there's a special cause answer to the question of “why?”
There could be, possibly, a stable and predictable system that generates a “21-year high” that's still within the calculated process behavior limits. If that's the case, reducing the homicide rate would require more systemic analysis and problem solving that probably won't come from asking “why was last year's number high?” in a reactive way.
That article does add some context:
“That's a stunning 61 percent increase over 2015.”
But, again, we don't know how much the homicide rate fluctuates normally from year to year. Had there been a stunning decrease from 2014? Is “stunning” the same as “statistically significant?”
The City of San Antonio provides data on their website. That data only goes back to 2011.
Starting with the annual data, the process behavior chart would look like what you see below (with a recognition that seven data points doesn't create the most reliable limits — having 20 to 25 data points to start is more valid, statistically).
This process behavior chart, created with the data from 2011 to 2017 implies that every data point is “noise.” It implies the same system is leading to variation in the data from year to year.
There's no likely explanation for why 2016 had a higher homicide rate other than “it's higher some years than others.” We might not like to see an average of 100 homicides per year. But this chart suggests that we'd expect to see between 40 and 160 homicides in 2018 if nothing changes in the system.
Creating a Chart with More Baseline Annual Data Points
Digging deeper (meaning more Google searches) brought me data from before 2010. The San Antonio Express-News has data from 2007 that's different than the City of San Antonio data. So, I made another process behavior chart using their data through 2015:
Again, that looks like a classic “in control” or “stable and predictable system” – with the caveat, again, that the baseline for the average and the limits is based on just nine data points. Four points are above average and five are below – not unusual. There didn't appear to be any trend or “special cause” or “signal” in those years. Some years, there are more homicides than others.
With this baseline data, the calculated process behavior limits were at 75 and 150, meaning we would predict, at the start of 2016, that the year would bring anywhere between 75 and 150 homicides unless something changed in the “system” of life there.
As 2016 played out, the 149 homicides reported is just below the calculated limit of 149.57 (to be more precise). That tells us it's likely that there could have been 149 homicides as part of “noise” in the data — it's within the expected range of the number of homicides we'd expect to see if it's a stable and predictable system. It doesn't mean anything necessarily changed in 2016. But, when we're close to one of the limits, it's possible that the data point is a signal and that there's a special cause behind it.
I don't mean to de-humanize the homicide victims and their families by treating them as data. The Express-News site has the stories of these victims.
There were 136 homicides back in 2008, so 149 in 2016 doesn't seem completely out of line from the past. And a decrease to 125 in 2017 (if it's due to the task force or not) is still a relatively high homicide rate compared to the past decade and the average of 110 or so.
Again, just because the process behavior chart shows an expected average and range, that doesn't mean San Antonio has to accept that. They can work to improve the system (through task forces or other means) rather than just being reactive and looking at any one or two data points.
If I create a process behavior chart using 2007 to 2017 (adding the city's 2016 and 2017 data to the newspaper's earlier data), it looks like this:
I then found data going back to 2002 from City-data.com, so I created a process behavior chart using 2002 to 2017 data points as the baseline for the average and the limits that are calculated from this data:
The average and limits are lower because 2002 to 2005 were all “below average” years compared to the longer-term average.
In this scenario, 2016 does appear to be higher than the upper process behavior limit, which suggests it is an outlier or a signal and that suggests there is likely a “special cause” for that data point. It would be, then, worthwhile to ask “what was different about 2016?”
Is 2016 a signal or not? It depends on the baseline data that was used. More on this in my summary at the end of the post.
Using Monthly Data Instead
We can also create a process behavior chart based on monthly data (from the City of San Antonio), which provides more data points to work with (while recognizing there is bound to be more month-to-month variation than we'd see in year-to-year data). Sure, there's a different number of days in each month, but let's see what the data tell us when viewed monthly. I used the first 25 data points to calculate an average and the limits… here is the process behavior chart:
Using the “Western Electric Rules,” the main things we're looking for include:
- A single data point above or below the limits
- 8 consecutive data points above or below the average
In October and November 14, we see a different “Western Electric Rule” being triggered:
- 2 out of 3 consecutive data points near the limit
That's a signal, which would trigger us to ask “what happened in October and November 2014?”
We see a single data point above the limit in September 2015. What happened?
Then, starting in January 2016, we see a run of well more than eight consecutive data points above the average. This isn't surprising since the annual 2016 data point was very high.
Was there a new “system” established in February 2016 or so? If I create a new average and new limits, it looks like this (using February 2016 to October 2017 for the new baseline):
What we see, unfortunately, is that average monthly homicide rate increased from 7.28 to 11.7. Why is that? Also, the variation increased so, going forward, San Antonio would expect to see between zero (the lower limit can't be negative here, even if that's what the calculation says) and 26.5 homicides each month.
If the city's task force and other efforts are going to bring the homicide rate back down, I'd first look for eight consecutive months below this new average of 11.7. There were three data points there already (August, September, and October 2017). I'm not sure what the November or December monthly data showed.
I hope that average will come back down and that San Antonio can find ways to reduce the average even more.
So What's the Point?
I hope this analysis makes sense and that I'm explaining this clearly (as I practice explaining concepts for my next book Measures of Success). The book will, of course, contain detail about how to create these charts and how to calculate the limits (as explained here).
Point 1: Don't accept simple two-data-point comparisons in your workplace. This time of year, you'll hear a lot of reports about how some organizational performance metric was higher or lower in 2017 compared to 2016. Ask for more data points. Plot the dots. Look for statistically valid trends (such as eight or more consecutive points above or below the average).
Point 2: Yes, the process behavior charts and limits are a bit sensitive to the timeframe you choose as a baseline. I'd suggest not manipulating the methodology by selectively choosing a baseline timeframe that leads to the answer you want to see. Try to look at the “voice of the process” honestly. Use 20 to 25 baseline data points when you can.
Point 3: Sometimes “rolling up” data into annual buckets can mask variation in the monthly or weekly data. We should consider the timeframes that we use in charting our metrics. When using monthly metrics, we have the opportunity to detect signals more quickly, which means we can start investigating causes more quickly.
Point 4: As always, one helpful benefit of process behavior charts is that it can help us avoid overreacting to every up and down in the data. We sometimes have to improve by using more systematic methods than just asking “what went wrong last year?”
What do you think about this type of analysis? Would this be helpful for your organization?