Solutions Company Statit Training Home
 



Transformation of Skewed Data Distribution in Health Care

Robert F. Hart, Ph.D.
Marilyn K. Hart, Ph.D.

Before one can make a valid control chart for variables data (a.k.a. measurement data or continuous data), it is necessary for the data distribution to be "near-normal" [Testing for "Near -Normality...", September, 2004]. However, it is common in health care for the data distribution to be unsymmetrical with a long tail of high values. Such a distribution is said to be "skewed to the right" and instead of the points on the probability plot tending to lie along a straight line, they tend to fall along a smooth curve which is convex upward.

Consider, for example, surgery times. There is some minimum time that the surgery will take, and it obviously cannot take less than 0 minutes. However, there is no real upper bound. Table 1 gives the surgery times for 50 consecutive procedures. Such data are typically severely skewed to the right as is the case here. This is reflected in the probability plot, Figure 1.

Table 1. Surgery Times

Surgery Number
Surgery Time
(in minutes)
Surgery
Number
Surgery Time
(in minutes)
1 75 26 130
2 65 27 115
3 165 28 75
4 60 29 95
5 75 30 80
6 75 31 125
7 85 32 105
8 80 33 70
9 85 34 72
10 95 35 95
11 65 36 90
12 65 37 120
13 85 38 75
14 68 39 90
15 190 40 85
16 120 41 90
17 105 42 80
18 115 43 115
19 58 44 80
20 70 45 65
21 80 46 70
22 75 47 485
23 65 48 75
24 75 49 90
25 90 50 120


Figure 1. Probability Plot of Surgery Times in Minutes

A convenient way to handle this problem of data skewed to the right is to "transform" the data into a data set which has a near-normal distribution [Hart and Hart, 2002]. This may often be accomplished by choosing a particular power of the data to make it meet the assumption (where the zero power is taken to be the natural logarithm). This family of transformations may be expressed as transf(X) = Xp, where X and transf(X) are the original and transformed data respectively and p is the power to which X is raised. The more severely X is skewed to the right, the lower the value of p required to obtain a near-normal transformation. A suitable transformation is found by trial and error. Experience has shown that one of three trial transformations will often be satisfactory; p = 0.25 (the fourth root), the natural logarithm, or p = -1 (the reciprocal).

The reader is encouraged to make a number of trial transformations to become familiar with the method. The transformation chosen here is transf(X) = X-1, X to the -1 power which is the reciprocal of X. This transformation provides the needed near-normal distribution (as shown in Figure 2) and makes physical sense in that the transformed variable is procedures per minute where the original data was expressed in minutes per procedure.


Figure 2. Probability Plot of Transformed Surgery Times, Procedures per Minute

A future paper will consider the control charts of the original data and the transformed data, as well as exploring the "back-transformation" of the results to make a valid control chart in the original units, procedures per minute.

References

M. Hart and R. Hart, Statistical Process Control for Health Care, Pacific Grove, CA: Duxbury, 2002.
M. Hart and R. Hart. "Testing for 'Near-Normality': the Probability Plot ", Statit Bulletin, September, 2004