As I mentioned in June, I'm in Knoxville this week to take four days of classes with Donald J. Wheeler, Ph.D. I've long admired Dr. Wheeler and his work and I was thrilled when he wrote the foreword to my latest book.
One lesson from Understanding Variation is to not overreact to a report about a single data point.
I recently saw this headline:
Just because it's the lowest average in 48 years… it doesn't mean that this year's MLB-wide batting average is low in a way that's statistically meaningful.
What can we do? Go get the data. As Wheeler says, “without context, data have no meaning.” As I've done before, I go to baseball-reference.com to find the annual MLB-wide averages.
If we did what we see in many workplace reports, we'd report something like “the 2018 batting average of .249 is lower than last year's average of .255.” But, we have to, as always, be careful about comparing two data points. Is that lower batting average something that represents a meaningful “signal” or is it just “noise” in the data?
What can we do? Plot the dots. Even a simple run chart for recent years tells us more than a comparison of two data points:
If we create a Process Behavior Chart with an average and calculated Natural Process Limits for these seasons starting in 2010, the 2018 number is above the Lower Limit, so it's not a signal in and of itself:
But what if we get more data and more context? Here is the Process Behavior Chart that I created going back to 1950 (an admittedly arbitrary date). I calculated the average the limits for the first 25 years. We can see how averages have shifted over time. It's not a single predictable system over time. Things have changed in the game of baseball:
Below, I've shifted the average and limits over time as those changes appeared to occur. We know we have sustained shifts in performance when eight or more consecutive years are above or below the established average (and we look for points outside of the limits).
A Process Behavior Chart only tells us something has changed. It doesn't tell us WHAT changed. We need to understand our system. In a Lean workplace, we'd go to the gemba.
It seems that the shifts occurred:
- 1963 – shift downward (why?)
- 1973 – shift upward (maybe because the Designated Hitter was added to the American League?)
- 1993 – shift upward again (due to the “steroid era?”)
- 2010 – shift downward (due to PEDs being driven out of the game an increase in defensive shifts?)
It's not worth asking “why was the batting average higher or lower?” when those data points are in the realm of noise. It is worth asking why the system changed when we see signals in the metric.
2018's average is low for this century. It's not historically low if you go back more than 50 years.
Zooming in on the last years, the 2018 average of .249 is just above the lower limit of .248. It's not a signal.
But, as the game evolves (more defensive shifts, more emphasis on home runs and power pitchers), is 2018 the start of yet another shift downward in major league batting averages or will it fluctuate back up next year? Only time will tell.
Don't want to miss a post or podcast? Subscribe to get notified about posts via email daily or weekly.