The perils of measurement

Measuring performance sounds like a good idea, but it has downsides.

How well are you doing your job? And who would like to know? It makes sense to measure outputs, and it makes sense to provide rewards according to these outputs. Actually, though, rewarding people for measured outputs can be harmful.

One problem is that people may try to perform well according to the things that happen to be measured. When police are judged by the number of arrests they make, they may pick easy targets and ignore the harder and bigger cases. They might even give false figures.

During the Vietnam war, US commanders reported body counts. These were grossly inflated, and counted civilian deaths as deaths of enemy soldiers. The result was that Washington decision-makers thought the war was going much better than it was. In command economies, like the former Soviet Union, unrealistic targets were given to enterprises with the result that production figures were falsified, corners were cut and unnecessary output produced. The targets were supposed to lead to increased production but instead became ends in themselves.

            In the academic world, the measuring process changes what people do. Enrolment targets can lead to dubious recruitment practices. Rewarding scholars by the numbers of their publications can lead to a reduction in quality and to exploitation of research students. Ranking universities in part by the number of citations to publications by their researchers has led to recruitment of staff simply for their citation counts. (Marc Edwards and Siddhartha Roy have written a superb critique of perverse incentives for academic research.)

In these examples, measurement leads to changes in behaviour by those being measured. The problem occurs when these changes are undesirable. Sometimes the problems are anticipated and sometimes not.

Another disadvantage of measuring performance is that it can undermine intrinsic motivation. In some occupations, for example health and education, many workers are driven by their commitment to helping others. External inducements, such as salary and promotions, are secondary. External inducements can actually reduce intrinsic motivations.

When metrics rule

These issues are comprehensively covered in a new book, The Tyranny of Metrics, by Jerry Z. Muller. “Metrics” here refers to measurements. Muller is not opposed to metrics. He repeatedly observes that many metrics are valuable, helping to identify areas for improvement and identify good practice. But in too many cases, measurements cause problems.

            One area for measuring performance is surgery. The success rates of different surgeons can be collected and, in what is called transparency, published. There are some initial benefits: surgeons with very poor outcomes may decide to withdraw from particular operations or from surgery altogether. But if used for ongoing scrutiny, measuring outcomes can lead to surgeons avoiding complex or difficult cases. After all, tackling the most challenging surgeries is likely to lead to a lower success rate.

If a hospital is judged by the percentage of patients who are able to leave intensive care within a specified number of days, there will be pressure to move some patients out too soon.

The problem is basically that the thing measured becomes a goal in itself. This is aggravated when rewards are attached. When real estate agents are paid according to sales, it can lead to shady practices of giving loans to individuals with no assets or income. This was part of what led to the global financial crisis.

Muller says there are three key components of “metric fixation”: (1) a belief that numbers can replace judgement; (2) a belief that making metrics public ensures accountability; (3) a belief that giving rewards for measured performance is the best way to motivate workers. To these I would add a belief that there are no good alternatives for improving performance.

There is yet another problem with measuring performance. Only some things are easily or accurately measured, and other things are intangible or obscure. In a workplace, outputs that can be measured include sales, share prices, new clients and so forth. Often given short shrift are collegial support, mentoring and morale boosting, which are important but not easily quantified. The result can be that self-interested narcissists get ahead at the expense of those who are generous and supportive.

            Muller describes eleven predictable though unintended negative consequences of metrics: goal displacement, short-termism, costs in employee time, declining value of continuing use, proliferation of rules to address faults of metrics, rewarding of luck, discouraging risk-taking, discouraging innovation, discouraging cooperation, degrading of work and eroding productivity. With such a list of negatives, no wonder Muller tries to give credit to metrics when he can. Even so, he actually may be overlooking some of their shortcomings. In the case of police, Muller says that systems to identify areas where crime is more likely to occur are useful for making decisions about deploying police. However, Cathy O’Neil in her book Weapons of Math Destruction argues that identification of at-risk areas may actually be a self-fulfilling prophecy and contribute to racially biased arrest patterns even when individual officers are unbiased.

Muller gives examples of the problems of metrics in quite a few areas: universities, schools, police, military, business, hospitals. His chapters on each of these areas are valuable. But even more valuable is the way his analysis encourages readers to start thinking more critically about metrics.

Jerry Z Muller

Other examples

When activists organise a rally, its success is commonly measured by the number of people who attend. Sometimes estimates differ considerably, for example with organisers saying 100,000 people showed up whereas police say 20,000. Discrepant estimates testify to the importance put on the metric of crowd size. What both sides miss are the less observable factors, such as the extent to which participants are energised by the experience and the number who decide to become more deeply involved. Hahrie Han in her book How Organizations Develop Activists distinguishes between mobilising and organising. Mobilising aims to get people who are already sympathetic to take action. Organising aims to develop the motivations and skills of individuals, a transformative process. Counting numbers at rallies is a reasonable way to judge the success of some sorts of mobilising but can be misleading in relation to organising.

            I’ve written before about citizen advocacy, in which paid coordinators seek to identify people with intellectual disabilities who have unmet needs and then, for each such protégé recruited, find a member of the community who will be the protégé’s advocate, without any compensation, often on a long-term basis. In Australia, various forms of disability advocacy have been funded by the government. Citizen advocacy was discriminated against by use of a misleading metric. The efforts of paid advocates were measured by the number of separate advocacy actions. However, the efforts of citizen advocacy programmes were measured by the number of new protégé-advocate relationships created. Not only was the support for existing relationships overlooked, but so were the actions of the citizen advocates. The metric made citizen advocacy seem like a boutique (that is, expensive) form of advocacy when actually it is often more cost-effective. (Some funders have become better informed about citizen advocacy.)

One of the challenges in questioning metrics is that understanding their shortcomings requires deep knowledge of what is involved, and this can take time and effort to acquire. It’s so much easier to look at a number of publications or arrests or successful surgeries than to probe into goals and methods of achieving them.

Some metrics continue to be used because they serve the interests of powerful groups. A good example is GDP, gross domestic product, a standard measure of economic activity. Having a big GDP is widely seen as a good thing, and a high per capita GDP is often used as a surrogate for quality of life. The shortcomings of GDP have been analysed for decades. Expenditure on traffic accidents, prisons, planned obsolescence, me-too drugs and oil spills contributes to an increased GDP though these are negatives rather than positives. Producing a $20,000 dress counts as much as 1000 pairs of inexpensive shoes. Various alternatives to GDP have been proposed, such as the human development index. Nevertheless, GDP continues to be used while alternatives are given little attention. This is convenient for governments that tout their economic performance while allowing inequality to increase.

What to do?

If you are unaware of the problems with a particular metric, you can hardly be blamed for relying on it. Let’s assume, though, that you have become aware of the metric’s shortcoming. For example, you are a police officer aware that total arrest numbers are not a good way to measure effectiveness or a surgeon aware that the survival rate from an operation is not an ideal way to measure your skill. What should you do?

The cynical response is to aim to achieve well according to the metric even though you realise this may harm actual outcomes for your occupation. This is most easily rationalised by trying to forget about the shortcomings of the metric, denigrating those who question the metric, and pointing to arguments in support of the metric. Basically, you conform to the misleading metric’s imperatives and convince yourself, and maybe try to convince others, that this is the only or best way to proceed.

In contrast, a high-minded response is to ignore the misleading metric and do your job according to what you believe is in the best interests of citizens, patients, your colleagues and other stakeholders. A police officer thus might sacrifice good arrest figures by focusing on more important outcomes. The trouble with this response is that you might miss out on opportunities or even derail your career. Meanwhile, your cynical co-workers get ahead and make decisions that continue the misguided practices.

Another response is to gripe in private about the bad metrics. This sounds pointless but actually can be useful in finding out who else is dissatisfied and potentially building a constituency for change. However, griping can also be an unproductive release of emotion that allows problems to fester.

Rather than just griping, it is possible to promote alternative metrics, assuming they are available. Just using them in conversation can help raise awareness. If friends talk about growth in the economy, you can comment about a worsening in the Gini coefficient (a measure of economic inequality). This can help start conversations and get others thinking about and discussing alternatives.

            If you are enterprising, you can study more about metrics and their shortcomings. Muller’s book is a useful tool. Then you are in a position to make more informed comments or even to publicise concerns and propose alternatives.

Even more time-consuming is development of alternative ways of promoting good practice, which might not involve metrics at all. This is not a task for everyone, but it’s important that some people put energy into it.

It’s worth thinking about different options because no one can do everything. Metrics are all around us, some good, some bad and some pointless. There’s no universal solution to the problems but it’s valuable to be aware of the problems and take action when possible.

Brian Martin