Lies, Damned Lies, and Statistics
Up to 72 percent of people believe anything they read on the Internet. The other 67 percent don't care.
Image source: Midjourney.
When I am writing a story -- journalism or something vaguely resembling it -- I inevitably arrive at a point where I need a number. Doesn't matter what the story is about. I need a convincing statistic that supports the argument I am trying to make; something that reassures the reader that the thing about which I am opining is actually happening to other people very much like themselves, according to some Organization With An Impressive-Sounding Name.
Even more than that: I crave a number. I also need reassurance that what I'm writing about is believed by someone other than the person who hired me to write it. Journalists are addicted to numbers. Studies, surveys, reports, polls, white papers, infographics, yadda yadda. Cannot get enough of them. That's because assigning numbers to things makes them seem more real, even if the numbers themselves actually aren't.
PR professionals know this, which is why one of the easiest ways to get their clients' names into a story is by publishing the results of a survey. Were you aware that the number of studies published on the Internet has grown more than 37 percent year over year? [1]
I am rambling on about this because I just read a great story in The Atlantic that nails this phenomenon called "The Statistics That Come Out of Nowhere." [2] Written by professors at Boston University, Columbia, and Harvard, it's about how numbers that often have an at best tenuous relationship to reality end up quoted so often they're simply accepted as fact. These "decorative statistics" are then used to manipulate public opinion or justify policy decisions.
To wit:
In one of a series of 1991 speeches...Vice President Dan Quayle [3] remarked that the United States had too much litigation and too many lawyers, as evidenced by the fact that 70 percent of the world’s lawyers were American—a number that was then repeated by authority figures across the political spectrum. But as the law professor Marc Galanter calculated at the time, America’s share of lawyers was probably more like 25 to 35 percent, roughly in line with the U.S. share of global GDP in the early 1990s. Quayle also claimed that lawsuits (and the threat of lawsuits) cost Americans $300 billion a year. That equally alarming estimate—also widely quoted in discussions of tort reform—came from a Forbes magazine article that quoted a back-of-the-envelope calculation by a corporate-defense lawyer who used as his foundational cost estimate an offhand, unsourced assertion that a CEO had made at a roundtable discussion. The news that the civil-justice system costs the country $300 billion annually was, in Galanter’s memorable phrasing, “news from nowhere.”
Among other things, the three authors recommend that when you see a statistic that seems a little dubious, trace it back to the original source and see if there's any there, there.
Well, good luck with that. I spend more time chasing down numbers than actual writing, for some pieces. Like a boy searching for a pony in a room filled with manure, I comb the interwebs, seeking canonical sources.
I have my own number that drives me nuts: 175 zettabytes. That's the amount of data the world will allegedly produce annually by the year 2025, according to International Data Corp.
(FYI: A zettabyte is 1000 exabytes, equivalent to 1,000,000 petabytes, 1,000,000,000 terabytes, or 1,000,000,000,000 gigabytes. [4] Sometimes people like me try to help readers visualize these numbers -- "enough data to fill 3.5-inch floppy disks stretching from Earth to Uranus," for example -- but it's still impossible to wrap your head around. All you need to know is that it's a Big F*cking Number.)
The number itself was probably pulled out of someone's butt a very rough guess and dates back to a report published in 2018. But it's useful in illustrating something that is true: the volume of data we produce is growing exponentially. And it's quoted everywhere.
Here are another two factoids that have been repeated ad nauseum: 1) Each day we produce 2.5 quintillion bytes of data (that's 2.5 billion gigabytes for those of you keeping track at home), and 2) Ninety percent of the world's data has been produced over the last two years. These chestnuts are buttered all over the Internet. They come from a report published by IBM in 2012, yet people continue to trot them out year after year. Is this still true? Was it ever? Who knows?
In both of those cases, the numbers may be dubious, but they point to a real phenomenon. A bigger problem is when real numbers are manipulated to make a political point. For example, I could tell you that just 0.3 percent of the US population has died from Covid. Or I could tell you that roughly 1.2 million people have died. Both are accurate. But one number makes the impact seem trivial; the other, tragic.
As we all trudge forward, lab rats in this experiment known as the Internet, it is getting harder and harder to tell the real from the fake. AI is going to make this job even more challenging.
There's a saying among cranky old journalists: If your mother says she loves you, check it out. In other words, never accept anything at face value. So if you run across a number on the Internet that seems absurd or surprising, check it out. There's a 57.2 percent chance it isn't really true.
Is there some "fact" you found on the Internet that seems impossible to believe? Email me: crankyolddan AT gmail.com, and I'll check it out.
[1] I just totally made that up. Did you believe it?
[2] Hopefully this is accessible to nonsubscribers. If you can't get to it, email me and I'll send you a pdf.
[3] Remember when Dan Quayle was the dumbest politician to ever hold major public office? Good times.
[4] Don't even get me started on Yottabytes, Brontobytes, or Geopbytes. Were these names all invented by Dr. Seuss?
I'm with the remaining 39 percent who believe that this isn't going to end well. Although, 7 out of 10 experts aren't sure.