Lately there’s been a lot of conversation about how ineffective or potentially misleading averages are for describing data. That’s a good thing, because averages do seem to get misused or misapplied with all sorts of data.
A nice way to intuitively understand the weakness of an average is to apply it to an area we’re pretty good with intuitively: vision. Specifically, let’s average some pictures and images. Images are just data in a matrix. Every pixel or dot on your screen has a number (or three if it’s RGB color) behind it.
The images I’ll show are grayscale. Each pixel has one number that goes from 0 to 255, with 0 being pure black, 255 being pure white, and everything in between being some shade of gray.
So let’s say I put this picture in front of you. How would you describe it?
You might say it’s grainy, black and white, gray, and kind of the same random pattern all around. You might even see the texture and think maybe it’s a picture of a pumice stone? It’s not. It’s just noise I generated.
Here’s what the above picture would look like if we took all the pixels, added them up, and then divided them by the total number of pixels. Then we assign every pixel in the image below with the number we get. We’ll call it our mean filtered image.
We’ve lost some information. About the only thing we’ve retained is that it’s generally a gray image. We’ve lost that it’s grainy and coarse. We’ve lost any hint that it could have been a pumice stone. We’ve lost the variation, the blacks, the whites, the grays. All we have is gray. Endless gray.
So, here’s the thing, the original image was a pretty good candidate for using an average on. It’s what would be called approximately normally distributed. Even then, the average isn’t an adequate description of the original image we saw.
Unfortunately, a lot of data isn’t even a good candidate for using an average on. Let’s try a different image…you know the drill, describe it:
You might have described the beautiful dark pixelated background, the fact that there is text, the conspiratorial message, “the average is a lie!” You might have described the faded grayish heart to the right. You may have noticed the slender and tall capital letters. You may have taken note of the exclamation point that punctuates the conspiratorial message, or the bright white text that stands in stark contrast against the dark but textured background. There is the subtle disconnect between the urgent and spooky text and the otherwise happy little heart to the right.
Now let’s describe this image data with an average…
Hmmm, something seems to have gone horribly wrong. The average tells us that the picture we saw is just a dark grey. It’s not wholly incorrect, it’s just probably not how you would describe the data you saw before.
Now most data that people analyze won’t have a designed message embedded in it if you put it in a matrix and look at it as a picture. What a lot of data will have is variation, complexity, patterns, trends, or just random patterns and random “trends.”
If the only description of the data you get is an average (or the not much better but often used average and standard deviation), you just need to recognize that you could possibly be missing a tremendous amount of information.
So whenever you can, demand plots and visualizations that actually show the data. Or better yet, get the original data and look at it yourself. And still keep in mind, you could still be missing the whole picture. Otherwise, ignore the average.
But wait, there’s one more thing to be careful about. Remember that first noisy picture? Maybe you thought it looked like a pumice stone? We had more information with our original image than we did with the average. But that didn’t mean we interpreted a more accurate picture. Some of us saw a pumice stone where no such thing existed.