Solutions Company Statit Training Home

Transformation of Skewed Data Distribution in Health Care

Robert F. Hart, Ph.D.
Marilyn K. Hart, Ph.D.

Before one can make a valid control chart for variables data (a.k.a. measurement data or continuous data), it is necessary for the data distribution to be "near-normal" [Testing for "Near -Normality...", September, 2004]. However, it is common in health care for the data distribution to be unsymmetrical with a long tail of high values. Such a distribution is said to be "skewed to the right" and instead of the points on the probability plot tending to lie along a straight line, they tend to fall along a smooth curve which is convex upward.

Consider, for example, surgery times. There is some minimum time that the surgery will take, and it obviously cannot take less than 0 minutes. However, there is no real upper bound. Table 1 gives the surgery times for 50 consecutive procedures. Such data are typically severely skewed to the right as is the case here. This is reflected in the probability plot, Figure 1.

Table 1. Surgery Times

 Surgery Number Surgery Time (in minutes) Surgery Number Surgery Time (in minutes) 1 75 26 130 2 65 27 115 3 165 28 75 4 60 29 95 5 75 30 80 6 75 31 125 7 85 32 105 8 80 33 70 9 85 34 72 10 95 35 95 11 65 36 90 12 65 37 120 13 85 38 75 14 68 39 90 15 190 40 85 16 120 41 90 17 105 42 80 18 115 43 115 19 58 44 80 20 70 45 65 21 80 46 70 22 75 47 485 23 65 48 75 24 75 49 90 25 90 50 120

Figure 1. Probability Plot of Surgery Times in Minutes

A convenient way to handle this problem of data skewed to the right is to "transform" the data into a data set which has a near-normal distribution [Hart and Hart, 2002]. This may often be accomplished by choosing a particular power of the data to make it meet the assumption (where the zero power is taken to be the natural logarithm). This family of transformations may be expressed as transf(X) = Xp, where X and transf(X) are the original and transformed data respectively and p is the power to which X is raised. The more severely X is skewed to the right, the lower the value of p required to obtain a near-normal transformation. A suitable transformation is found by trial and error. Experience has shown that one of three trial transformations will often be satisfactory; p = 0.25 (the fourth root), the natural logarithm, or p = -1 (the reciprocal).

The reader is encouraged to make a number of trial transformations to become familiar with the method. The transformation chosen here is transf(X) = X-1, X to the -1 power which is the reciprocal of X. This transformation provides the needed near-normal distribution (as shown in Figure 2) and makes physical sense in that the transformed variable is procedures per minute where the original data was expressed in minutes per procedure.

Figure 2. Probability Plot of Transformed Surgery Times, Procedures per Minute

A future paper will consider the control charts of the original data and the transformed data, as well as exploring the "back-transformation" of the results to make a valid control chart in the original units, procedures per minute.

References

M. Hart and R. Hart, Statistical Process Control for Health Care, Pacific Grove, CA: Duxbury, 2002.
M. Hart and R. Hart. "Testing for 'Near-Normality': the Probability Plot ", Statit Bulletin, September, 2004

 © Conduent Business Services, LLC. All rights reserved. Conduent and Conduent Agile Star are trademarks of Conduent Business Services, LLC in the United States and/or other countries.