Double-Use Data


Recall that whenever we look at a set of data, we can immediately form a number of ideas that might account for any patterns evident in the data. However, these ideas are post hoc not a priori. After looking at the data, it would be wrong to formulate a theory and then to present the data as though it was an independent test of our theory. It is okay to use data to formulate a theory, but it is not okay to use a single set of data both to formulate the theory and to “independently” test the theory. We refer to this as double-use data.

An historical example of double-use data relates to the Theory of Continent Drift in the field of geology. Today, there is very good evidence that the continents behave as “plates” that slide very slowly across the earth’s surface. However, the theory began in an inauspicious manner. If you look at a map of the world, there seems to be a certain similarity between the eastern coastlines of north and south America, and the western coastlines of Europe and Africa. One could easily imagine all four continents pushed together like jigsaw puzzle pieces.

Over the past century, excellent evidence has been assembled that is consistent with this theory. However, this evidence was not available when the theory was first proposed. If we ask “What inspired the theory?” the answer was “Look at how the continents seem to fit together like jigsaw puzzle pieces.” If we then ask “What evidence do we have that’s consistent with the theory?” the answer was “Look at how the continents seem to fit together like jigsaw puzzle pieces.” In other words, the same evidence was used both as the inspiration for the theory, and as evidence for the theory. Notice that this is circular reasoning. The Theory of Continental Drift took time to be accepted precisely because of the lack of independent evidence.

Normally, we are not interested in where theories come from (the context of discovery). Instead, we are interested in testing theories (the context of legitimation). However, we are right to question a theory if the context of discovery and the context of legitimation are the same.

When you look at some observations you have collected, any theory you form is now post hoc theory. You cannot then claim that your theory was a priori and then use the observations as evidence that tests your theory. Once you look at your data, you cannot pretend that you predicted the data. In short, post hoc theories are not “prophetic.” You cannot use the language of prediction that is the essence of hypothesis testing.

It is important to formulate your theories before you collect your data — and before you examine or analyze your data. Since we sometimes forget where ideas come from, be sure to keep a research diary or lab notebook. This will help you as a researcher to keep straight which ideas are a priori and which ideas are post hoc.

Slogan: Beware of the post hoc theory.

Remember that everything tends to be obvious in retrospect (hindsight is 20/20). We can make up a story for just about any set of observations. The true test is making up the story first (i.e., prediction).