Explore-then-Test Approach


One of the strongest research strategies combines an exploratory study with a subsequent study that tests a resulting hypothesis. The exploratory study is used to acquaint the researcher with the phenomenon, and to inspire the researcher to formulate an interpretation or explanatory theory. This theory is then used to generate one or more conjectures. One of the conjectures is refined to one or more testable hypothesis, and these hypotheses are then tested using either a correlational method or an experimental method. We can refer to this research strategy as the explore-then-test approach.

Of course one cannot use the same data both to inspire the hypothesis and to test the hypothesis (i.e., double-use data). A new set of data is required in order for the test to be truly a priori.

The Reserved Dataset

Suppose that a researcher has administered an exploratory survey questionnaire to 100 people. Examining the surveys, the researcher observes certain correlations. For example, the researcher might discover that people who enjoying listening to sad music score high on questions measuring the personality trait known as “openness.” Notice that the research employs an exploratory correlational approach, and that the discovery is “post hoc.”

Having made this observation, the conscientious researcher would then go ahead and test it. The hypothesis is that “people who enjoying listening to sad music will score high on the openness personality trait.” Our researcher might form a new questionnaire (or even reuse the earlier questionnaire) to collect new data. With this new data, the hypothesis can now be tested as an a priori hypothesis.

In many circumstances, returning to collect a second set of data can prove to be inconvenient or costly. For example, an ethnomusicologist may have difficulty obtaining funding to return to a remote location for a second round of data collection. In these situations, a clever strategy might make use of the so-called reserved dataset also known as held-out data.

Suppose that our researcher had originally collected survey questionnaire data from 200 people (rather than 100). Before examining any of the data, the researcher randomly selects 100 surveys and stores them out-of-reach. (The researcher simply avoids looking at these 100 surveys.) This will be the “reserved dataset” or “held-out data.#148; Examining the remaining surveys, the researcher observes the correlation between enjoyment of sad music and”open” personalities. As before, this is a “post hoc” discovery. It would be nice to test this as a proper a priori hypothesis. In effect, the reserved dataset is equivalent to gathering questionnaire data for 100 new participants. (It’s like collecting the data before we need it.) So the researcher can then test the hypothesis using the held-out dataset.

Notice that the held-out dataset allows the researcher to carry out an exploratory study (whose purpose is to inspire theories and conjectures), followed by proper a priori hypothesis testing, while collecting data only once. This method is especially valuable when the researcher knows that it will be difficult to arrange a second round of data collection.