Daily data: be sceptical

Be careful about data you encounter every day, especially in the news.

 beavis-butthead-and-numbers

If you watch the news, you are exposed to all sorts of numbers, intended to provide information. Some might be reliable, such as football scores, but with others it’s harder to know, for example the number of people killed in a bomb attack in Syria, the percentage of voters supporting a policy, the proportion of the federal budget spent on welfare, or the increase in the average global temperature.

Should you trust the figures or be sceptical? If you want to probe further, what should you ask?

To answer these questions, it’s useful to understand statistics. Taking a course or reading a textbook is one approach, but that will mainly give you the mathematical side. To develop a practical understanding, there are various articles and books aimed at the general reader. Demystifying Social Statistics gives a left-wing perspective, a tradition continued by the Radstats Group. Joel Best has written several books, for example Damned Lies and Statistics, providing valuable examinations of statistics about contested policy issues. The classic treatment is the 1954 book How to Lie with Statistics.

Most recently, I’ve read the recently published book Everydata by John H. Johnson and Mike Gluck. It’s engaging, informative and ideal for readers who want a practical understanding without encountering any formulas. It is filled with examples, mostly from the US.

everydata

            You might have heard about US states being labelled red or blue. Red states are where people vote Republican and blue states are where people vote Democrat. Johnson and Gluck use this example to illustrate aggregated data and how it can be misleading. Just because Massachusetts is a blue state doesn’t mean no one there votes Republican. In fact, quite a lot of people in Massachusetts vote Republican, just not a majority. Johnson and Gluck show pictures of the US with the data broken down by county rather than by state, and a very different picture emerges.

red_state_blue_state-svg
R
ed, blue and in-between states

            In Australia, aggregated data is commonly used in figures for economic growth. Typically, a figure is given for gross domestic product or GDP, which might have grown by 2 per cent in the past year. But this figure hides all sorts of variation. The economy in different states can grow at different rates, and different industries grow at different rates, and indeed some industries contract. When the economy grows, this doesn’t mean everyone benefits. In recent decades, most of the increased income goes to the wealthiest 1% and many in the 99% are no better off, or go backwards.

The lesson here is that when you hear a figure, think about what it applies to and whether there is underlying variation.

In the Australian real estate market, figures are published for the median price of houses sold. The median is the middle figure. If three houses were sold in a suburb, for $400,000, $1 million and $10 million, the median is $1 million: one house sold for less and one for more. The average, calculated as total sales prices divided by the number of sales, is far greater: it is $3.8 million, namely $0.4m + $1m + $10m divided by 3.

The median price is a reasonable first stab at the cost of housing, but it can be misleading in several ways. What if most of those selling are the low-priced or the high-priced houses? If just three houses sold, how reliable is the median? If the second house sold for $2 million rather than $1 million, the median would become $2 million, quite a jump.

sydney-houses sydney-house-expensive
Is the average or median house price misleading?

            In working on Everydata, Johnson and Gluck contacted many experts and have used quotes from them to good effect. For example, they quote Emily Oster, author of Expecting Better: Why the Conventional Pregnancy Wisdom is Wrong, saying “I think the biggest issue we all face is over-interpreting anecdotal evidence” and “It is difficult to force yourself to ignore these anecdotes – or, at a minimum, treat them as just one data point – and draw conclusions from data instead.” (p. 6)

Everydata addresses sampling, averages, correlations and much else, indeed too much to summarise here. If Johnson and Gluck have a central message, it is to be sceptical of data and, if necessary, investigate in more depth. This applies especially to data encountered in the mass media. For example, the authors comment, “We’ve seen many cases in which a finding is reported in the news as causation, even though the underlying study notes that it is only correlation.” (p. 46) Few readers ever check the original research papers to see whether the findings have been reported accurately. Johnson and Gluck note that data coming from scientific papers can also be dodgy, especially when vested interests are involved.

The value of a university education

For decades, I’ve read stories about the benefits of a university education. Of course there can be many sorts of benefits, for example acquiring knowledge and skills, but the stories often present a figure for increased earnings through a graduate’s lifetime.

money-education

            This is an example of aggregated data. Not everyone benefits financially from having a degree. If you’re already retired, there’s no benefit.

There’s definitely a cost involved, both fees and income forgone: you could be out earning a salary instead. So for a degree to help financially, you forgo income while studying and hope to earn more afterwards.

The big problem with calculations about benefits is that they don’t compare like with like. They compare the lifetime earnings of those who obtained degrees to the lifetime earnings of those who didn’t, but these groups aren’t drawn randomly from a sample. Compared to those who don’t go to university, those who do are systematically different: they tend to come from well-off backgrounds, to have had higher performance in high school and to have a greater capacity for studying and deferred gratification.

Where’s the study of groups with identical attributes, for example identical twins, comparing the options of careers in the same field with and without a degree? Then there’s another problem. For some occupations, it is difficult or impossible to enter or advance without a degree. How many doctors or engineers do you know without degrees? It’s hardly fair to calculate the economic benefits of university education when occupational barriers are present. A fair comparison would look only at occupations where degrees are not important for entry or advancement, and only performance counts.

A final example

For those who want to go straight to takeaway messages, Johnson and Gluck provide convenient summaries of key points at the end of each chapter. However, there is much to savour in the text, with many revealing examples helping to make the ideas come alive. The following is one of my favourites (footnotes omitted).

 hamburger

Americans are bad at math. Like, really bad. In one study, the U.S. ranked 21st out of 23 countries. Perhaps that explains why A&W Restaurants’ burger was a flop.

As reported in the New York Times Magazine, back in the early 1980s, the A&W restaurant chain wanted to compete with McDonald’s and its famous Quarter Pounder. So A&W decided to come out with the Third Pounder. Customers thought it tasted better, but it just wasn’t selling. Apparently people thought a quarter pound (1/4) was bigger than a third of a pound (1/3).

Why would they think 1/4 is bigger than 1/3? Because 4 is bigger than 3.

Yes, seriously.

People misinterpreted the size of a burger because they couldn’t understand fractions. (p. 101)

 john-h-johnson
John H. Johnson

mike-gluck
Mike Gluck

John H. Johnson and Mike Gluck, Everydata: The Misinformation Hidden in the Little Data You Consume Every Day (Brookline, MA: Bibliomotion, 2016)

Brian Martin
bmartin@uow.edu.au