Data Independence
Suppose you were carrying out a telephone-based survey whose aim is to sample likely voting behavior for an up-coming election. Using an auto-dialing system, you are connected to successive random telephone numbers for land-lines in a particular geographical district. Once you make contact, you ask the survey participants: (1) whether they are eligible and likely to vote in the election, and (2) which candidate they are likely to vote for. You find that most telephone calls are unsuccessful: either no one answers, or the person is unwilling to participate. You are pleased whenever you encounter a cooperative respondent. As you complete your conversation with one person, you hear another voice in the background—it is the respondent’s husband. Eager to recruit another participant, you ask whether it would be possible to speak with the person’s spouse. They agree, and so you are able to collected data from two people rather than just one.
Saavy researchers discourage this sort of behavior. People who live together share many things in common—including attitudes. For married couples, both spouses are likely to share similar political views. Collecting data from two people who live together is much less useful than collecting data from two unrelated people. When collecting data this way, the results will appear more uniform than would occur in a proper representative sample. The data collected from 50 couples is less representative than the data collected from 100 independent people.
All statistical methods used in empirical research assume “independent data.” That is, the analysis methods presume that each datum was collected in a way that avoids undue similarity.
Suppose we asked someone to judge the degree of “expressiveness” in several recorded performances of the same work. If we collected all of our data from a single judge, then our measure of “expressiveness” would really be “Alice’s idea of expressiveness” or “Adam’s idea of expressiveness” rather than “expressiveness.” We may need to recruit many judges in order to increase the representativeness of “expressiveness” judgments. This is the reason why most experiments involve multiple participants rather than relying on a single person.
Now consider the musical stimuli created for such an experiment. Suppose we recruited a single flute player (“Jean-Pierre”), and instructed him to play several renditions with differing degrees of musical expressiveness. Once again, the musical results might be limited. Instead of judging “expressiveness”, our participants would really be judging “expressiveness as produced by Jean-Pierre.”
Now suppose that we recruited many performers, and many listeners and had the listeners judge the degree of expressiveness of different recorded performances of a passage by Hindemith. Once again, we may have a problem. The problem is that “expressiveness” may pertain to how performers interpret Hindemith, or interpret a particular musical passage by Hindemith. If our goal is to understand “expressiveness” in general, then we ought to broaden our stimuli to include different passages by different composers.
In each of these cases, the question is how representative of the population is our sample? How representative of composer’s expressiveness is Hindemith? How representative of performer’s expressiveness is Jean-Pierre? How representative of expressive judges are Alice and Adam?
In the ideal world, we would aim to have truly independent data by having each listener judge just one recording, and have each recording performed by a different performer using a different musical instrument playing a unique musical passage written by different composers from different periods and nations. Rather than having 1 person judge 100 stimuli, it would be better to have 100 people judge our stimuli. And rather than repeating the same 100 stimuli for each participant, it would be better for each participant to hear entirely novel stimuli.
In actual practice, researchers routinely violate the “data independence” ideal by having participants respond to more than one stimulus, having only a handful of judges evaluate something, or using the same stimuli for all participants. However, it is important to recognize that this is merely a matter of convenience. Having made the effort to participate in an experiment, most participants would think it silly to make just a single judgment.
Whenever collecting data, ask yourself: What possible connections exist between this datum and the other data I’m collecting? In what ways do the data have similar origins that reduce the data independence? What practical steps might I take that would increase the data independence? Try to avoid data that clump together because of similar origins. If we think of clumped data as “sticky,” our slogan can be phrased as follows:
Slogan: Avoid sticky data.
Auditions and Music Adjudications
Incidentally, the idea of data indepence has repercussions for doing music auditions and music adjudication. Most music schools conduct auditions and juries intended to evaluate the performance skills of different musicians. Rather than rely on a single judge, it is common to creat a two- or three-member panel. If music examiners are permitted to talk with each other before assigning their grades, then the independence of their judgments will be reduced. Suppose for example, that the three judges mentally judge a performer as roughly A-, B, and C+ respectively. In conversing among themselves, the person who speaks first will have the greatest impact on the final assessment. For example, if the first judge says “That was pretty good.” The second judge might already be thinking that their B could be revised to B+, and the third judge might be thinking that a B- or B might be appropriate. Conversely, if the third judge speaks first “I think there are some issues here …” the other two judges may mentally begin to down-grade their initial mental assessments.
Nearly every institution allows adjudicators to talk among themselves before assigning their grades. So instead of the three original grades: A-, B, and C+, the grades might be A-, B+ and B (if the first judge speaks first) or B, B- and C+ (if the third judge speaks first). Institutions often encourage conversation among the adjudicators because it makes the process look consistent. If a student were to receive a report giving A-, B, and C+, then the student is likely to think the assessments are invalid—and the judges are likely to feel some embarrassment. At many institutions, judges informally collude to avoid differences of more than a single point, so a performer might receive B, B, and B-. In fact, the process really is highly variable. But this variability is masked by collusion. Note that if the assessment process were highly reliable, then only one judge would be necessary.
Notice that the simple act of talking makes the data “sticky.” A person’s judgment is no longer just “their judgment;” it is influenced by the other judges. As a consequence, it is a mistake to view the three grades as being produced by three independent judges. Later, we will learn that statisticians refer to data independence as degrees of freedom (abbreviated df). Colluding reduces the degrees of freedom, with the consequence of increasing the likelihood of making both Type I and Type II errors.