Statistical Significance


The word “significant” has a special meaning in statistics. It does not mean that something is important. Instead, it means that the observed results are unlikely to have occurred by chance. The word “unlikely” is important: “Statistically significant” doesn’t mean mean the results didn’t occur by chance. It just means that the results are unlikely to have occurred by chance.

Suppose we have a coin. We suspect that the coin might be biased — it is possible that it has two “heads” or two “tails.” How many coin tosses would we need to observe in order to test our suspicion? Before we answer this question, we must draw our line in the sand. We need to do this a priori — before we make any observations. We draw the line by setting the confidence level. The confidence level can be any value, but two are conventionally used in empirical research — the 95% confidence level, and the 99% confidence level. Suppose we choose the 95% confidence level. This level corresponds to a significance level of 0.05. That is, we would achieve the 95% confidence level if the probability of our observations falls below 0.05.

Now we state our hypothesis:

H1. This coin is not fair.

Our null hypothesis reverses the formulation:

H0. This coin is fair.

We want to observe a series of coin tosses, and test whether the observed outcomes are consistent with the hypothesis that the coin is fair.

We begin with our first observation. We toss the coin once and it comes up heads. The probability of this result is 0.5. Now we toss the coin a second time, and again it comes up heads. Once again, the probability of this result is 0.5. However, the probability of two consecutive tosses coming up heads is 0.5 × 0.5 = 0.25. We toss the coin a third time, and again it comes up heads. The probability of three consecutive heads is 0.5 × 0.5 × 0.5 = 0.125. When the fourth toss comes up heads, the probability is now 0.5 × 0.5 × 0.5 × 0.5 = 0.062. Finally, a fifth toss comes up heads. The probability of five consecutive heads is 0.031. Notice that this probability is lower than 0.05.

At this point, the observations have reached the 95% confidence level. We can now conclude that the observations are not consistent with the hypothesis that the coin is fair. (Notice that we haven’t proved the coin is unfair.) More precisely, we can claim that

The observations are not consistent with the hypothesis that the coin is fair at the 95% confidence level.

We now have reason to discard the null hypothesis.

Having discarded the null hypothesis, we can now infer that the results are, instead, consistent with the original research hypothesis. We say that

The observations are consistent with the hypothesis that the coin is unfair at the 95% confidence level.

We can say that the results are statistically significant.

Notes

  1. Statistical significance is mathematically related to, but not the same as, the confidence level. A confidence level of 95% correspondents to a significance level of 0.05. A confidence level of 99% correspondents to a significance level of 0.01.
  2. Statistical significance is represented by the lower-case Greek letter alpha (α). When we set the confidence level (and therefore the significance level), it is common to say we have “set alpha to …”
  3. A result is said to be statistically significant when the probability of the observations for the null hypothesis is lower than the significance level.
  4. Unlike everyday speech, the word “significant” in statistics is not the same as importance. Just because a result is statistically “significant” doesn’t mean that it is important.
  5. Recall that the confidence level is a way of “drawing a line in the sand.” The purpose is to make it clear when our idea has failed. In statistical tests, statisticians don’t pay attention to how close the results are to the line. Our sole criterion is “which side of the line” the observations fall. A set of observations that exhibit a p value of 0.00001, is not “more significant” than 0.01. However, if we set an a priori significance level of 0.00001, then positive results would mean we have achieved a higher confidence level. We’ll have more to say about this later.