Regression to the Mean


Suppose we were walking down the street together and encountered an especially tall person. I immediately turn to you and make the following confident prediction: “I predict that the next person we encounter will be shorter in height.” Later in our walk we encounter an especially short person. Once again I turn to you and confidently predict: “I bet that the next person we encounter will be taller.” Are you impressed if my predictions turn out to be correct?

Most people are of average height. If we had to predict the height of an unknown person, our best prediction would be that they are of average height. Compared to a tall person, a person of average height would necessarily be shorter; and similarly, compared to a short person, a person of average height will be taller.

In general, a tall person is likely to be followed by someone who is shorter. To the uninitiated, it might seem that the presence of the tall person somehow caused the next person to be shorter. In fact, the operative principle is quite simple: most people are near average height. It is not the case that the tall person caused a shorter person to appear. Instead, the next person is most likely to be of average height, and average height is shorter than a tall person.

The phenomenon we have just described is known as regression to the mean or regression toward the mean. Statisticians define regression to the mean as follows:

An extreme measurement is likely to be followed by a less extreme measurement.

Regression-to-the-mean is not a phenomenon. Instead it is what logicians call a tautology — a necessary statistical truth. Unfortunately, it has proved to be one of the most difficult concepts for human minds to understand.

The Sports Illustrated Cover Jinx

One of the most well-known examples of regression-to-the-mean is the so-called Sports Illustrated Cover Jinx. Sports Illustrated is a popular magazine that covers all kinds of sports, from boxing to volleyball. Like any magazine, the cover picture is normal reserved for photos of an athlete who has recently achieved something extraordinary. The cover might show a celebratory photo of a cyclist winning the Tour de France, a hockey star celebrating a goal, or a gymnast holding up her olympic gold medal.

For decades, people have observed that the performance of athletes tends to decline immediately after they appear on the cover of Sports Illustrated. After having won a series of tournaments, a golfer who appears on the cover appears to have trouble placing among the top five finishers for the next several tournaments.

Statisticians have shown that the Sports Illustrated cover jinx is real. It really is the case that athletes tend to do worse after appearing on the cover compared with their recent accomplishments. But statisticians have also established that the effect is entirely attributable to regression-to-the-mean.

Suppose that a basketball player, on average, scores 16 points per game. We tend to think of all scored points as a result of skill alone. However, apart from skill, there are other factors that influence how many points a basketball player scores. For example, a player is apt to score more points when playing against a poorer team. If a star teammate is sidelined with an injury, a player may receive more passes from the other players, and that will increase the likelihood of scoring more points. On the other hand, a player is likely to score fewer points if the coach keeps him on the bench rather than playing in the game. You can think of the number of points scored per game as reflecting two broad factors: (1) the true skill level of the player, (2) other factors that have nothing to do with the player’s skill. Suppose that the player’s skill is truly stable and doesn’t change from game to game. Nevertheless, the number of points scored by that player will still vary because of the other factors. As a result, the number of points scored in successive games will vary around the player’s true skill level: 14 points, 18 points, 22 points, 11 points, 17 points, etc. That is, we expect random variation around the true population mean — the true average number of points scored.

If you flip a coin enough times, there will be times when the coin exhibits long identical strings by chance alone. For example, by chance, a coin may turn up heads 12 times in a row. There are literally hundreds of professional basketball players that play in the NBA. Just simply by chance, a particular player may have a string of games in which he scores lower than his average number of points-per-game. This is not necessarily because the player’s skill has declined. It simply occurs by chance.

Similarly, with enough players and enough games, some player will have a string of games in which he scores higher than his average number of points-per-game. Instead of scoring an average of 16 points per game, we might see a series of ten games in which the player scores an average of 22 points per game. The natural tendency is to believe that this increase is due to an improvement in the player’s skill. Indeed, it may be true that the player has improved. However, statistics teaches us to expect these things to occur by chance. In the case of the coin-flips, we understand that the coin is not “improving” merely because of a sequence of heads.

Scoring many points is likely to draw attention, and this is likely invite a journalist to write about the athlete’s “extraordinary improvement” in recent games. Sure enough, the athlete ends up with his photo on the cover of Sport Illustrated.

If a coin produces 12 heads in a row, on the next toss, the likelihood of turning up tails is 0.5. It is possible that the string of heads will continue, but it is highly unlikely that it will last for long. At some point, the coin will appear to return to its long-term average: 50% heads and 50% tails. (Remember, extreme values tend to be followed by a less extreme values.) Similarly, after a long string of “successes,” the athlete is likely to return to his long-term average.

Statisticians have formally shown that the purported Sports Illustrated cover jinx is entirely a consequence of regression-to-the-mean. If we encounter a series of five tall people, the likelihood is that the next five people will be of average height. The people didn’t “get shorter.” It’s simply that most people are of average height. Similarly, the athlete’s skill didn’t decline. It’s simply that his skill for most games will be around his average skill level.

Rewards and Punishments

If you want someone to behave in a particular way, which is more effective: Punishing the person for poor behavior? Or rewarding the person for good behavior?

Psychological research (with both humans and other animals) has established that rewards are more effective in shaping behavior than punishments. Unfortunately, people have difficulty believing the research. As you are about to discover, the reason why we have difficulty believing this is linked to regression to the mean.

Suppose you are trying to teach someone how to do something. There are several possible feedback strategies. One approach is to scold them when they fail and praise them when they succeed. Another approach would be to praise them when they succeed and say nothing when they fail. A third approach would be to scold them when they fail and say nothing when they succeed. A fourth approach would be to say nothing at all. Each of these four strategies has different consequences.

The psychologist Daniel Kahneman (winner of the 2002 Nobel Prize in Economics) tells the following pertinent story:

“I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said,”On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don’t tell us that reinforcement works and punishment does not, because the opposite is the case.”

Notice what’s going on. When trying to learn a complex skill, we often require many attempts. Whether the attempt fails or succeeds depends on many factors. This includes our current skill level, and also depends to some degree on luck (chance). Sometimes we hit the target mainly by chance. Sometimes our technique is good, but chance intervenes so that we miss the target.

Our actual performance on any trial will fluctuate randomly around the mean value of our true current skill. Sometimes we will do better, and sometimes worse. But most trials will be near our average skill level. Recall that an extreme measurement is likely to be followed by a less extreme measurement. When we do very poorly, the next trial is likely to be better—simply by chance. Conversely, when we do very well, the next trial is likely to be worse.

Now consider what happens if someone is scolding or praising us. Because of regression to the mean, scolding someone for bad performance is likely to be followed by an improvement in performance (by chance). But the improvement is not due to the scolding; it is due to chance. Similarly, praising someone for especially good performance is likely to be followed by a decline in performance (by chance). The scolding or praise may have little effect. Nevertheless, what is the instructor likely to conclude about the value of scolding or praise?

Kahneman continues his story:

“This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.”

We live in a world in which we believe scolding is useful, largely because of regression to the mean. At the same time, we come to believe that praise is not very effective, principally because of regression to the mean. Kahneman is right to regard the situation as “perverse.”

References:

Daniel Kahneman (2012). Thinking, Fast and Slow. New York: Farrar, Straus and Giroux.