Mathematical models: the toxic variety

Job applications, credit ratings and the likelihood of being arrested can be affected by mathematical models. Some of the models have damaging effects.


In 1983, U.S. News & World Report – then a weekly newsmagazine in competition with Time and Newsweek – published a ranking of US universities. For U.S. News, this was a way to increase sales. Its ranking system initially relied on opinions of university presidents, but later diversified by using a variety of criteria. As years passed, the U.S. News ranking became more influential, stimulating university administrators to seek to improve rankings by hiring academics, raising money, building facilities and, in some cases, trying to game the system.

One of the criteria used in the U.S. News ranking system was undergraduate admission acceptance rates. A low acceptance rate was assumed to mean the university was more exclusive: a higher percentage of applicants to Harvard are rejected than at Idaho State.

US high school students planning further study are commonly advised to apply to at least three prospective colleges. Consider the hypothetical case of Sarah, an excellent student. She applies to Stanford, a top-flight university where she would have to be lucky to get in, to Michigan State, a very good university where she expects to be admitted, and to Countryside Tech, which offers a good education despite its ease of admission.

Sarah missed out at Stanford, as expected, and unfortunately was also rejected at Michigan State. So she anticipated going to Countryside Tech, but was devastated to be rejected there too. What happened?

The president of Countryside Tech was determined to raise his institution’s ranking. One part of this effort was a devious admissions policy. Sarah’s application looked really strong, so admissions officers assumed she would end up going somewhere else. So they rejected her in order to improve Tech’s admissions percentage, making Tech seem more exclusive. Sarah was an unfortunate casualty of a competition between universities based on the formula used by U.S. News. 


            In Australia, the U.S. News rankings are little known, but other systems, ranking universities across the globe, are influential. In order to boost their rankings, some universities hire academic stars whose publications receive numerous citations. A higher ranking leads to positive publicity that attracts more students, bringing in more income. Many students mistakenly believe a higher ranking university will provide a better education, not realising that the academic stars hired to increase scholarly productivity are not necessarily good teachers. Indeed, many of them do no teaching at all. Putting a priority on hiring them means superb teachers are passed over and money is removed from teaching budgets.


The story of U.S. News university rankings comes from an important new book by Cathy O’Neil, Weapons of Math Destruction. O’Neil started off as a pure mathematician teaching in a US university, then decided to enter the private sector where she could do something more practical as a “data scientist.” Working for a hedge fund and then some start-ups, she soon discovered that the practical uses of data analysis and mathematical models were damaging to many ordinary people, especially those who are disadvantaged. She wrote Weapons of Math Destruction to expose the misuses of mathematical modelling in a range of sectors, including education, personal finance, policing, health and voting.

A model is just a representation of a bigger reality, and a mathematical model is one that uses numbers and equations to represent relationships. For example, a map is a representation of a territory, and usually there’s nothing wrong with a map unless it’s inaccurate or gives a misleading impression.


            The models that O’Neil is concerned about deal with people and affect their lives, often in damaging ways. The model used by U.S. News, because it was taken so seriously by so many people, has distorted decisions by university administrators and harmed some students.

“Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.” (p. 21)

Another example is a model used to allocate police to different parts of a city. By collecting data about past crimes and other factors supposedly correlated with crime, the model identifies areas deemed to be at risk and therefore appropriate for more intensive policing.


This sounds plausible in the abstract, but in practice in the US the result is racially discriminatory even if the police are themselves unprejudiced. Historically, there have been more crimes in disadvantaged areas heavily populated by racial minorities. Putting more police in those areas means even more transgressions are discovered – everything from possession of illegal drugs to malfunctioning cars – and this leads to more arrests of people in these areas, perpetuating their disadvantage. Meanwhile, crimes that are not geographically located are ignored, including financial crimes of the rich and powerful.


Not every mathematical model is harmful. O’Neil says there are three characteristics of weapons of math destruction or WMDs: opacity, damage and scale. Opacity refers to how transparent the model is. If you can see how the model operates – its inputs, its algorithms, its outputs – then it can be subject to inspection and corrected if necessary. O’Neil cites models used by professional baseball clubs to recruit players and make tactical choices during games. These models are based on publicly available data: they are transparent.

In contrast, models used in many parts of the US to judge the performance of school teachers are opaque: the data on which they are based (student test scores) are not public, the algorithm is secret, and decisions made on the basis of the models (including dismissing teachers who are allegedly poor performers) are not used to improve the model.

The second feature of WMDs is damage. Baseball models are used to improve a team’s performance, so there’s little damage. Teacher performance models harm the careers and motivation of excellent teachers.

The third feature is scale. A model used in a household to decide on when to spend money can, at the worst, hurt the members of the household. If scaled up to the whole economy, it could have drastic effects.

Cathy O’Neil

O’Neil’s book is engaging. She describes her own trajectory from pure mathematician to disillusioned data scientist, and then has chapters on several types of WMDs, in education, advertising, criminal justice, employment, workplaces, credit ratings, insurance and voting. Without a single formula, she tells about WMDs and their consequences.

The problems are likely to become worse, because data companies are collecting ever more information about individuals, everything from purchasing habits to opinions expressed on social media. Models are used because they seem to be efficient. Rather than reading 200 job applications, it is more efficient to use a computer program to read them and eliminate all but 50, which can then be read by humans. Rather than examining lots of data about a university, it is more efficient to look at its ranking. Rather than getting to know every applicant for a loan, it is more efficient to use an algorithm to assess each applicant’s credit-worthiness. But efficiency can come at a cost, including discrimination and misplaced priorities.

My experience

Earlier in my career, I did lots of mathematical modelling. My PhD in theoretical physics at the University of Sydney was about a numerical method for solving the diffusion equation, applied to the movement of nitrogen oxides introduced into the stratosphere. I also wrote computer programmes for ozone photochemistry in the stratosphere, among related topics. My initial PhD supervisor, Bob May, was at the time entering the field of mathematical ecology, and I helped with some of his calculations. Bob made me co-author of a paper on a model showing the effect of interactions between voters.

During this time, I started a critical analysis of models for calculating the effect of nitrogen oxides, from either supersonic transport aircraft or nuclear explosions, on stratospheric ozone, looking in particular at the models used by the authors of two key scientific papers. This study led eventually to my first book, The Bias of Science, in which I documented various assumptions and techniques used by the authors of these two papers, and more generally in scientific research.

While doing my PhD, some other students and I studied the mathematical theory of games – used for studies in economics, international relations and other topics – and ran an informal course on the topic. This enabled me to later write a paper about the social assumptions underpinning game theory.

In the following decade, as an applied mathematician at the Australian National University, I worked on models in astrophysics and for incorporating wind power in electricity grids. Meanwhile, I read about biases in models used in energy policy.

I had an idea. Why not write a book or manual about mathematical modelling, showing in detail how assumptions influenced everything from choices of research topics to results? My plan was to include a range of case studies. To show how assumptions affected results, I could program some of the models and then modify parameters and algorithms, showing how results could be influenced by the way the model was constructed and used.

However, other projects took priority, and all I could accomplish was writing a single article, without any detailed examples. For years I regretted not having written a full critique of mathematical modelling. After obtaining a job in social science at the University of Wollongong, I soon discontinued my programming work and before long was too out of touch to undertake the critique I had in mind.

I still think such a critique would be worthwhile, but it would have quite a limited audience. Few readers want to delve into the technical details of a mathematical model on a topic they know little about. If I were starting today, it would be more illuminating to develop several interactive models, with the user being able to alter parameters and algorithms and see outcomes. What I had in mind, decades ago, would have been static and less effective.

What Cathy O’Neil has done in Weapons of Math Destruction is far more useful. Rather than provide mathematical details, she writes for a general audience by focusing on the uses of models. Rather than looking at models that are the subject of technical disputes in scientific fields, she examines models affecting people in their daily lives.

Weapons of Math Destruction is itself an exemplar – a model of the sort to be emulated – of engaged critique. It shows the importance of people with specialist skills and insider knowledge sharing their insights with wider audiences. Her story is vitally important, and so is her example in showing how to tell it.

“That’s a problem, because scientists need this error feedback – in this case the presence of false negatives – to delve into forensic analysis and figure out what went wrong, what was misread, what data was ignored. It’s how systems learn and get smarter. Yet as we’ve seen, loads of WMDs, from recidivism models to teacher scores, blithely generate their own reality. Managers assume that the scores are true enough to be useful, and the algorithm makes tough decisions easy. They can fire employees and cut costs and blame their decisions on an objective number, whether it’s accurate or not.” (p. 133)


Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (London: Allen Lane, 2016)

Brian Martin

Daily data: be sceptical

Be careful about data you encounter every day, especially in the news.


If you watch the news, you are exposed to all sorts of numbers, intended to provide information. Some might be reliable, such as football scores, but with others it’s harder to know, for example the number of people killed in a bomb attack in Syria, the percentage of voters supporting a policy, the proportion of the federal budget spent on welfare, or the increase in the average global temperature.

Should you trust the figures or be sceptical? If you want to probe further, what should you ask?

To answer these questions, it’s useful to understand statistics. Taking a course or reading a textbook is one approach, but that will mainly give you the mathematical side. To develop a practical understanding, there are various articles and books aimed at the general reader. Demystifying Social Statistics gives a left-wing perspective, a tradition continued by the Radstats Group. Joel Best has written several books, for example Damned Lies and Statistics, providing valuable examinations of statistics about contested policy issues. The classic treatment is the 1954 book How to Lie with Statistics.

Most recently, I’ve read the recently published book Everydata by John H. Johnson and Mike Gluck. It’s engaging, informative and ideal for readers who want a practical understanding without encountering any formulas. It is filled with examples, mostly from the US.


            You might have heard about US states being labelled red or blue. Red states are where people vote Republican and blue states are where people vote Democrat. Johnson and Gluck use this example to illustrate aggregated data and how it can be misleading. Just because Massachusetts is a blue state doesn’t mean no one there votes Republican. In fact, quite a lot of people in Massachusetts vote Republican, just not a majority. Johnson and Gluck show pictures of the US with the data broken down by county rather than by state, and a very different picture emerges.

ed, blue and in-between states

            In Australia, aggregated data is commonly used in figures for economic growth. Typically, a figure is given for gross domestic product or GDP, which might have grown by 2 per cent in the past year. But this figure hides all sorts of variation. The economy in different states can grow at different rates, and different industries grow at different rates, and indeed some industries contract. When the economy grows, this doesn’t mean everyone benefits. In recent decades, most of the increased income goes to the wealthiest 1% and many in the 99% are no better off, or go backwards.

The lesson here is that when you hear a figure, think about what it applies to and whether there is underlying variation.

In the Australian real estate market, figures are published for the median price of houses sold. The median is the middle figure. If three houses were sold in a suburb, for $400,000, $1 million and $10 million, the median is $1 million: one house sold for less and one for more. The average, calculated as total sales prices divided by the number of sales, is far greater: it is $3.8 million, namely $0.4m + $1m + $10m divided by 3.

The median price is a reasonable first stab at the cost of housing, but it can be misleading in several ways. What if most of those selling are the low-priced or the high-priced houses? If just three houses sold, how reliable is the median? If the second house sold for $2 million rather than $1 million, the median would become $2 million, quite a jump.

sydney-houses sydney-house-expensive
Is the average or median house price misleading?

            In working on Everydata, Johnson and Gluck contacted many experts and have used quotes from them to good effect. For example, they quote Emily Oster, author of Expecting Better: Why the Conventional Pregnancy Wisdom is Wrong, saying “I think the biggest issue we all face is over-interpreting anecdotal evidence” and “It is difficult to force yourself to ignore these anecdotes – or, at a minimum, treat them as just one data point – and draw conclusions from data instead.” (p. 6)

Everydata addresses sampling, averages, correlations and much else, indeed too much to summarise here. If Johnson and Gluck have a central message, it is to be sceptical of data and, if necessary, investigate in more depth. This applies especially to data encountered in the mass media. For example, the authors comment, “We’ve seen many cases in which a finding is reported in the news as causation, even though the underlying study notes that it is only correlation.” (p. 46) Few readers ever check the original research papers to see whether the findings have been reported accurately. Johnson and Gluck note that data coming from scientific papers can also be dodgy, especially when vested interests are involved.

The value of a university education

For decades, I’ve read stories about the benefits of a university education. Of course there can be many sorts of benefits, for example acquiring knowledge and skills, but the stories often present a figure for increased earnings through a graduate’s lifetime.


            This is an example of aggregated data. Not everyone benefits financially from having a degree. If you’re already retired, there’s no benefit.

There’s definitely a cost involved, both fees and income forgone: you could be out earning a salary instead. So for a degree to help financially, you forgo income while studying and hope to earn more afterwards.

The big problem with calculations about benefits is that they don’t compare like with like. They compare the lifetime earnings of those who obtained degrees to the lifetime earnings of those who didn’t, but these groups aren’t drawn randomly from a sample. Compared to those who don’t go to university, those who do are systematically different: they tend to come from well-off backgrounds, to have had higher performance in high school and to have a greater capacity for studying and deferred gratification.

Where’s the study of groups with identical attributes, for example identical twins, comparing the options of careers in the same field with and without a degree? Then there’s another problem. For some occupations, it is difficult or impossible to enter or advance without a degree. How many doctors or engineers do you know without degrees? It’s hardly fair to calculate the economic benefits of university education when occupational barriers are present. A fair comparison would look only at occupations where degrees are not important for entry or advancement, and only performance counts.

A final example

For those who want to go straight to takeaway messages, Johnson and Gluck provide convenient summaries of key points at the end of each chapter. However, there is much to savour in the text, with many revealing examples helping to make the ideas come alive. The following is one of my favourites (footnotes omitted).


Americans are bad at math. Like, really bad. In one study, the U.S. ranked 21st out of 23 countries. Perhaps that explains why A&W Restaurants’ burger was a flop.

As reported in the New York Times Magazine, back in the early 1980s, the A&W restaurant chain wanted to compete with McDonald’s and its famous Quarter Pounder. So A&W decided to come out with the Third Pounder. Customers thought it tasted better, but it just wasn’t selling. Apparently people thought a quarter pound (1/4) was bigger than a third of a pound (1/3).

Why would they think 1/4 is bigger than 1/3? Because 4 is bigger than 3.

Yes, seriously.

People misinterpreted the size of a burger because they couldn’t understand fractions. (p. 101)

John H. Johnson

Mike Gluck

John H. Johnson and Mike Gluck, Everydata: The Misinformation Hidden in the Little Data You Consume Every Day (Brookline, MA: Bibliomotion, 2016)

Brian Martin