Transformation of Skewed Data Distribution in Health Care
Robert F. Hart, Ph.D.
Marilyn K. Hart, Ph.D.
Before one can make a valid control chart for
variables data (a.k.a. measurement data or continuous
data), it is necessary for the data distribution
to be "nearnormal" [Testing for "Near
Normality...", September, 2004]. However,
it is common in health care for the data distribution
to be unsymmetrical with a long tail of high
values. Such a distribution is said to be "skewed
to the right" and instead of the points
on the probability plot tending to lie along
a straight line, they tend to fall along a smooth
curve which is convex upward.
Consider, for example, surgery times. There
is some minimum time that the surgery will take,
and it obviously cannot take less than 0 minutes.
However, there is no real upper bound. Table
1 gives the surgery times for 50 consecutive
procedures. Such data are typically severely
skewed to the right as is the case here. This
is reflected in the probability plot, Figure
1.
Table 1. Surgery Times
Surgery Number

Surgery Time
(in minutes)

Surgery
Number

Surgery Time
(in minutes)

1 
75 
26 
130 
2 
65 
27 
115 
3 
165 
28 
75 
4 
60 
29 
95 
5 
75 
30 
80 
6 
75 
31 
125 
7 
85 
32 
105 
8 
80 
33 
70 
9 
85 
34 
72 
10 
95 
35 
95 
11 
65 
36 
90 
12 
65 
37 
120 
13 
85 
38 
75 
14 
68 
39 
90 
15 
190 
40 
85 
16 
120 
41 
90 
17 
105 
42 
80 
18 
115 
43 
115 
19 
58 
44 
80 
20 
70 
45 
65 
21 
80 
46 
70 
22 
75 
47 
485 
23 
65 
48 
75 
24 
75 
49 
90 
25 
90 
50 
120 
Figure
1. Probability Plot of Surgery Times in Minutes
A convenient way to handle this problem of
data skewed to the right is to "transform"
the data into a data set which has a nearnormal
distribution [Hart and Hart, 2002]. This may
often be accomplished by choosing a particular
power of the data to make it meet the assumption
(where the zero power is taken to be the natural
logarithm). This family of transformations may
be expressed as transf(X) = X^{p}, where
X and transf(X) are the original and transformed
data respectively and p is the power to which
X is raised. The more severely X is skewed to
the right, the lower the value of p required
to obtain a nearnormal transformation. A suitable
transformation is found by trial and error.
Experience has shown that one of three trial
transformations will often be satisfactory;
p = 0.25 (the fourth root), the natural logarithm,
or p = 1 (the reciprocal).
The reader is encouraged to make a number of
trial transformations to become familiar with
the method. The transformation chosen here is
transf(X) = X^{1}, X to the 1 power
which is the reciprocal of X. This transformation
provides the needed nearnormal distribution
(as shown in Figure 2) and makes physical sense
in that the transformed variable is procedures
per minute where the original data was expressed
in minutes per procedure.
Figure 2. Probability Plot of Transformed Surgery
Times, Procedures per Minute
A future paper will consider the control charts
of the original data and the transformed data,
as well as exploring the "backtransformation"
of the results to make a valid control chart
in the original units, procedures per minute.
References
M.
Hart and R. Hart, Statistical Process Control
for Health Care, Pacific Grove, CA: Duxbury,
2002.
M.
Hart and R. Hart. "Testing for 'NearNormality':
the Probability Plot ", Statit Bulletin,
September, 2004
