The limits of expertise

In his book ‘The Signal and the Noise’, the American statistician and writer, Nate Silver, references an interesting study conducted by Philip Tetlock (pictured).

“The forecasting models published by political scientists in advance of the 2000 presidential election predicted a landslide 11-point victory for Al Gore. George W. Bush won instead. Rather than being an anomalous result, failures like these have been fairly common in political prediction. A long-term study by Philip E. Tetlock of the University of Pennsylvania found that when political scientists claimed that a political outcome had absolutely no chance of occurring, it nevertheless happened about 15 percent of the time.”

Tim Harford expands upon this brief introduction in his book ‘Adapt: Why Success Always Starts with Failure’:

“Even deep expertise is not enough to solve today’s complex problems.

Perhaps the best illustration of this comes from an extraordinary two-decade investigation into the limits of expertise, begun in 1984 by a young psychologist called Philip Tetlock. He was the most junior member of a committee of the National Academy of Sciences charged with working out what the Soviet response might be to Reagan administration’s hawkish stance in the cold war. Would Reagan call the bluff of a bully or was he about to provoke a deadly reaction? Tetlock canvassed every expert he could find. He was struck by the fact that, again and again, the most influential thinkers on the Cold War flatly contradicted one another. We are so used to talking heads disagreeing that perhaps this doesn’t seem surprising But when we realise that the leading experts cannot agree on the most basic level about the key problem of the age, we begin to understand that this kind of expertise is far less useful than we might hope.

Tetlock didn’t leave it at that. He worries away at this question of expert judgment for twenty years. He rounded up nearly three hundred experts – by which he meant people whose job it was to comment or advise on political and economic trends. They were a formidable bunch: political scientists, economists, lawyers and diplomats. There were spooks and think-tankers, journalists and academics. Over half of them had PhDs; almost all had post-graduate degrees. And Tetlock’s method for evaluating the quality of their expert judgement was to pin the experts down: he asked them to make specific, quantifiable forecasts – answering 27450 of his questions between them – and then waited to see whether their forecasts came true. They rarely did. The experts failed, and their failure to forecast the future is a symptom of their failure to understand fully the complexities of the present.”

In highly complex circumstances, there is a limit to the value of expertise. A certain amount quickly improves one’s success rate when predicting the future. However the rate of improvement soon begins to plateau. Before long, big increases in expertise can yield small gains in the success of one’s predictions.

For wildly complex situations, the most successful predictors incorporate ideas from different disciplines, pursue multiple approaches at the same time, are willing to acknowledge mistakes and rely more on observation than theory.

Rory Sutherland, Nate Silver and the measurability bias

In an article for The Spectator, Ogilvy Chairman Rory Sutherland discusses how companies filter prospective employees based on their grades.

“You need to whittle down applications somehow. And to create a spurious veneer of objectivity, recruiters all fall back on the same, lone quantifiable measure (degree class).”

Rory argues that this may not be the best technique.

“I have asked around, and nobody has any evidence to suggest that, for any given university, recruits with first-class degrees turn into better employees than those with thirds (if anything the correlation operates in reverse). There are some specialised fields which may demand spectacular mathematical ability, say, but these are relatively few.”

Reading between the lines a little, Rory believes that to excel in most fields of employment, recruits need an array of skills, most of which cannot be quantified by exam results. Despite this, a majority of value is attributed to a minority of measurable skills.

“In the words of F.A. von Hayek (praise be upon him) ‘Often that is treated as important which happens to be accessible to measurement.’”

To labour the point, we are biased towards the measurable.

You may ask why an ad man is interested in education. The truth is that the measurability-bias frequently affects the creative industries. In his TED talk, ‘Life Lessons From an Ad Man’, Rory tells the story of a brief given by Eurostar.

I’ve edited the transcript slightly for brevity.

“The question was given to a bunch of engineers, about 15 years ago, “How do we make the journey to Paris better?” And they came up with a very good engineering solution, which was to spend six billion pounds building completely new tracks from London to the coast, and knocking about 40 minutes off a three-and-half-hour journey time.

Now, call me Mister Picky, but it strikes me as a slightly unimaginative way of improving a train journey merely to make it shorter. What you should in fact do is employ all of the world’s top male and female supermodels, pay them to walk the length of the train, handing out free Chateau Petrus for the entire duration of the journey. Now, you’ll still have about three billion pounds left in change, and people will ask for the trains to be slowed down.”

While he’s painting a humorous picture, his point is clear. The speed of the journey was measurable, so it was overvalued. The experience of the journey is far less measurable, so it was undervalued.

Nate Silver covers the same point in The Signal and the Noise:

“Statheads can have their biases too. One of the most pernicious ones is to assume that if something cannot easily be quantified, it does not matter. In baseball, for instance, defence has long been much harder to measure than batting or pitching. In the mid-1990s, Beane’s Oakland A’s teams placed little emphasis on defence, and their outfield was manned by slow and bulky players, like Matt Stairs, who came out of the womb as designated hitters. As analysis of defence advanced, it became apparent that the A’s defective defence was costing them as many as eight to ten wins per season, effectively taking them out of contention no matter how good their batting statistics were.”

It’s important to uncover what aspects of a product or service consumers actually value. What demands aren’t being met? What is valued but underserved? Then, when you build a proposal, build on top of that understanding. Don’t build on something purely because it is measurable.

Value and measurability are independent.