Measurement


Against Measurement

The very concept of measuring things seems antithetical to the humanistic spirit. Why would anyone attempt to measure something like happiness or musicality? Many artworks are best regarded as priceless, and the value of a human life is infinite. Counting and measuring seem inherently dehumanizing.

Consider the case of the so-called Intelligence Quotient or IQ. IQ measurements have been used for all sorts of nafarious purposes, including criteria for categorizing people as “idiots” or “morons.” In The Mismeasure of Man, Stephen J. Gould argued vociferously against attempts to characterize people’s “intelligence.” Intelligience measures are linguistically and culturally biased. What is regarded as “intelligent” behavior to a Wall Street banker may be quite different for a Kalahari Bushman. Surely, comparing the intelligence of different human beings is egregious and barbaric.

Let’s consider a concrete example of the use of IQ measures. In the past, a common ingredient in paint was lead. However, several decades ago, lead was banned from paint because the lead was found to be detrimental to health, especially the health of children. Animal studies implicated lead as bad for brain development. But how could researchers be sure that the lead in household paints could be having a detrimental effect? After all, paint just sits on walls; it is rare that a child will eat a paint chip. The quantities of lead that are released to the air are miniscule. Surely, lead paint is not a significant health concern. It was discovered that the IQs of children who lived in houses with lead paint was lower than the IQs for children from matched socio-economic backgrounds who lived in houses without lead paint.

IQ may be only a “crude” estimate of a person’s mental functioning, but measuring IQ proved to be invaluable in improving the quality of health for millions of people. Similarly, differences in IQ also proved to be essential in discovering the terrible effects of methyl mercury on children’s mental development (Hubbard, 2010, p.40). How would researchers have ever discovered these toxic effects without measuring IQ?

Far from being dehumanizing or barbaric, in these cases, the measurement of IQ resulted in benevolent and humane consequences. Like anything else, measurements can be used for both moral and immoral ends.

In Praise of Measurement

As we discussed earlier, there are two main advantages to measurement. First, if we want to invite the world to tell us that we’re wrong, we need measurement in order to make it clear when we’re wrong (“We recognize failure by drawing a line in the sand.”) Secondly, measurement provides opportunities that alert us to phenomena whose existence we might otherwise never see.

Nevertheless, we must address two questions regarding measurement: First, aren’t there some things that simply can’t be measured? And second, aren’t there many things (like happiness) that can’t be measured with any reasonable precision?

Definition of Measurement

Hubbard (2010, p.23) provides a good definition of measurement: “A quantitatively expressed reduction of uncertainty based on one or more observations.” In order to be useful, a measurement doesn’t need to eliminate all uncertainty. A useful measurement simply needs to be better that what you might guess.

There are many things that people regard as intangible or immeasurable. Things like “quality,” “creativity,” or “conscientiousness” seem to exclude the possibility of measurement. First, dispense with the idea that measurement is about precision. The question is whether we can estimate through observation. Our measurement will never full grasp the concept: our aim is not to “essentialize.” We can never truly measure “love” in the same we that we can never truly measure “height.” In each case we make estimates using operationalizations that are approximations of the true concept.

If we care about something, then it must have consequences in the world. If something has consequences, then it must be possible to observe or recognize the consequences. If we can observe or recognize consequences, then it must be possible to see when the consequences are more or less. That is, we must be able to detect amount. If we can observe differences in amount, then we can estimate it.

Consider, for example, the following problems, posed by the administration for the Cleveland Orchestra: How do we know whether the quality of orchestral performances is getting better or getting worse? How do we know whether one conductor is better than another potential conductor? There are lots of possible ways of addressing these questions. We might ask the members of the orchestra for their opinions. We might count the number of good and bad newspaper reviews. We might pole the audience members for their opinions.

What the Cleveland Orchestra did was something very simple. They kept count of the number of standing ovations (Hubbard, 2010, p.34). This approach has several advantages. First, it is much easier to tabulate than running audience surveys or coding newspaper reviews. In addition, since the orchestra depends financially on ticket sales, the response of the audience is more important than the judgments of music critics, or even the musicians themselves. Of course, there is more to musical quality than whether a performance evokes a standing ovation. “Musical quality” is ephemeral—we will never be able to directly measure this. Instead, the aim is to make an estimate by observing real-world consequences that are likely related to the concept of interest.

David Moore, the former president of the American Statistical Association, offers the following advice: “If you don’t know what to measure, measure anyway. You’ll learn what to measure.”* (as quoted in Hubbard, 2010, p.31). All measurements begin as rather crude estimates. As you continue to measure something, you will discover various problems, and slowly refine your measure, both in terms of precision, and in terms of what you should be measuring. Like research itself, measurement methods become more refined with experience.

Fermi Questions

Enrico Fermi was one of the star theoretical physicists of the twentieth century. Fermi got in the habit of estimating answers before carrying out a detailed calculation. This habit often prevented onerous mistakes by ensuring that the calculations were reasonable. In a famous example, Fermi estimated the size of the atomic explosion by dropping pieces of paper from his hand during the blast. He estimated the size as 10 kilotons. The detailed calculation was around 20 kilotons.

Fermi would ask his students to estimate various things as exercises. For example, a student might be asked to estimate the number of windows in New York City. The most famous of these so-called “Fermi questions” was: Estimate the number of piano tuners in Chicago.

The population of the greater Chicago area is 10 million. If the average household contains 2.5 people per household, then there are roughly 4 million households in Chicago. If 1 in 100 households have an acoustic piano, then there are 40,000 pianos in Chicago. If each piano is tuned once every two years, then 20,000 pianos are tuned per year. Suppose that a piano tuner tunes 4 pianos per day (including transportation). Also suppose that each tuner works perhaps 250 days per year. Consequently, each piano tuner tunes roughly 1,000 pianos per year. If there are 20,000 piano tunings each year, then Chicago can employ only 20 full-time professional piano tuners. There currently are 16 piano tuners listed in the directory for Chicago (http://www.chacha.com/question/how-many-piano-tuners-are-there-in-chicago). At the time that Fermi posed this question (in the 1940s), there were many more households that had pianos. So in the 1940s, Fermi’s estimate was 100 tuners. At the time, there were roughly 80 tuners employed.

Fermi estimates are not true measurements since they involve no observation. But these sorts of “back-of-the-envelope” estimates often prove helpful when engaged in research.

Measuring the Unmeasurable

When we think of “unmeasurable” things, we tend to think of concepts like “honesty,” “stylishness,” or “reputation.” However, even straightforward concepts can raise practical difficulties when it comes to measurement. Biologists, for example, are commonly interested in such questions as How many fish are there in this lake? Apart from draining all of the water, it is hard to imagine how one might measure this. Nevertheless, biologists have developed useful estimation methods. In the case of counting fish, biologists commonly use a catch and release method. For example, the biologist might begin by catching 100 fish. These fish are tagged and then released back into the lake. The fish are given time to mix throughout the lake. The biologist then returns and catches another 100 fish. Suppose that 2 fish in the second catch were tagged. This suggests that the original catch of 100 fish represents about 2 percent of the fish in the lake—implying that the lake contains about 5,000 (catchable) fish.

Similarly, empirical musicologists have devised a number of techniques for measuring various things, such as How often does a person have a tune stuck in his/her head?

Subjective Judgment

In measuring things, one of the simplest approaches is to poll people’s subjective impressions. For example, in order to assess the amount of pain a person is experiencing, medical personnel will simply ask the patient to judge, on a scale of 1 to 10 (where 10 is the most intense pain they have ever experienced), how they would characterize their current pain level. Similarly, we can ask people simply to judge how “beautiful” a musical passage is, or judge the “skillfulness” of a given performance.

There are good reasons to be wary of subjective judgments. We’d often prefer more objective measures. If we want to know how much a particular car is worth, we can look up the manufacturer’s suggested retail price. However, many of the measures we consider objective, are, in fact, subjective measures. How much is gold worth? It’s worth whatever people will pay for it. A listing on a stock exchange is simply a record of the current price people are paying. Is a paining by Rembrandt truly priceless? If you put it on the market, you may well find that it is only worth $20 million (infinitely smaller than priceless). How much do you value good medical care? Let’s find out what proportion of your income you are paying for health insurance. How valuable is friendship? Let’s find out how much of your free time you spend with your friends.

Conclusion

When most people hear the word “measurement,” they tend to think of something “precise.” However, the best way to think of measurement is as a way of reducing uncertainty. Fundamentally, all measurements are types of estimates. Even the most imprecise measurements can prove useful.

Finally, despite our intutions to the contrary, there isn’t anything that can’t be estimated—and therefore there isn’t anything that can’t be measured. In his book, How to Measure Anything, Douglas Hubbard presents a strong case that even the most intangible of concepts can be measured.

References

Stephen J. Gould (1981). The Mismeasure of Man. New York: W.W. Norton.

Douglas W. Hubbard (2010). How to Measure Anything: Finding the Value of “Intangibles” in Business. 2nd edition. Hoboken, NJ: John Wiley & Sons.

Claude Shannon (1948). A mathematic theory of communication. The Bell System Technical Journal, Vol. 27 (July/October, 1948): 379-423, 623-656.

Stanley Smith Stevens, (1946). On the theory of scales and measurement. Science, Vol. 103, pp. 677-680.