Solutions Company Statit Training Home
 



The t Test

Bill Farrell, Ph.D.
Senior Analyst
Sutter Health

Sutter Health is a family of not-for-profit hospitals and physician organizations that share resources and expertise to advance health care quality. Serving more than 100 communities in Northern California, Sutter Health is a regional leader in cardiac care, cancer treatment, orthopedics, obstetrics, and newborn intensive care, and is a pioneer in advanced patient safety technology.

In this brief article we take a look at the legendary t test, the test used when making inferences with measured (as opposed to counted) data.

Suppose we administer an IQ test to two groups of people. The mean score for the first group is 100, and for the second group its 105. Is this 5-point difference statistically significant? To answer that, wed want to know a little more about the variability of scores within the groups. Consider two scenarios:

Scenario 1: The range of scores for the first group of people is 99-101, and the range of scores for the second group is 104-106. In this scenario, the difference is very likely to be significant. The scores are tightly clustered around their means, and the distributions do not overlap (that is, the highest score in the first group is 101, while the lowest score in the second group is 104).

Scenario 2: The range of scores for the first group is 65-170, and for the second group its 67-171. In this scenario it is highly unlikely that a 5-point difference would turn out to be significant. There is huge variability of scores around their means, and the distributions overlap almost completely.

This is the logic behind the t test: we calculate the difference between two groups of scores (the difference is 5 points in the example above) and then look at the variability of the scores to determine if the difference is meaningful.

We used the range above to describe the variability of the scores (i.e., the range for one group was 65-170). Statisticians (of course) have decided that the standard deviation is a much more useful measure of variability.

You can think of the standard deviation as (sort of) the average distance of a group of scores from its mean. If we have scores of 1, 2, 3, 4, and 5, the mean is 3 and the distances from the mean (ignoring sign) are 2, 1, 0, 1, and 2. Thus the average distance from the mean is 6/5 = 1.2. This is not the actual standard deviation, but its close and it nicely illustrates the concept.

What would our pseudo-standard deviation be if the scores were 1, 2, 3, 4, and 10? (Hint: its 2.4.) Just by changing a 5 to a 10 weve doubled the variability of our scores.

So the t test tells us whether a difference between two groups is statistically significant by looking at the size of the difference and the variability of the scores, where variability can be viewed (roughly) as how different scores are from each other or from the mean. The greater the variability, the greater the difference must be to achieve statistical significance.

This should provide some insight as to why researchers take such pains to match treatment and control groups on age, gender, health status, and the like. Homogeneous groups are likely to be low-variability groups, and low-variability groups give us our best shot at discerning subtle differences between them.