   Solutions Company Statit Training Home

Control Charts for Skewed Data

Robert F. Hart, Ph. D.
Marilyn K. Hart, Ph.D.

When encountering the common problem of highly skewed variables data1, the first step has been shown to be to transform the data to attain a "near-normal" distribution2. Using the example of 50 consecutive surgery times from the more recent reference the inverse transformation (i.e., the reciprocal) of the surgery times satisfied the need for near-normality.

The next step is to make control charts to determine whether the process is stable over time. If the I chart on the original surgery times in minutes per procedure (Figure 1) were made in spite of the fact that the data were severely skewed, two points would be found above the upper control limit. The cause(s) of these outages cannot be determined from this chart. They may be because of the skewed distribution, or because the process is not stable over time, or both. Figure 1. I Chart on the Original Data (Surgery Time in Minutes per Procedure)—Mouse over any data point or other "hot spot" to view additional information

The cause of the outages in Figure 1 is made clear by the I chart on the transformed data, Figure 2, where the plotted values are now in procedures per minute (rather than minutes per procedure) owing to the inverse transformation. Since there is no evidence of instability over time in Figure 2, one may infer that the process is stable over time and that the outages in Figure 1 were solely due to the skewed distribution. Be aware that because of the inverse transformation, the three high points in Figure 1 are the three low points in Figure 2. Figure 2. I Chart on Transformed Data (Procedures per Minute)—Mouse over any data point or other "hot spot" to view additional information

If this is to be only a retrospective study to determine stability, the task might be considered complete. However, if one wants to look at the process in the original units, minutes per procedure, the I chart in Figure 3 is required. Here the plotted points are the same as in Figure 1, but the control limits are found from "back-transforming" the results in Figure 2. For example, the UPPER control limit for Figure 3 is 280.11 minutes per procedure) is 1/(0.00357 procedures per minute) where 0.00357 procedures per minute is the LOWER control limit of Figure 2.

Figure 3 more be easier to explain to others than is Figure 2. Figure 3 would be preferred for ongoing process control so that the plotted points would be as measured rather than having to take the reciprocal of each before plotting it. Figure 3. I Chart on the Original Data (Minutes per Procedure) with the Back-Transformed Control Limits—Mouse over any data point or other "hot spot" to view additional information

Note that even the Xbar chart has an underlying normality assumption to the calculation of the control limits. The chart is fairly robust so that if the data are close to being normally distributed, the control chart may still work satisfactorily. However, if the data are severely skewed, the control chart may give false indications of lack of control.

References

 1 M. Hart and R. Hart. “Testing for ‘Near-Normality’: the Probability Plot ", Statit Bulletin, September, 2004] 2 M. Hart and R. Hart. “Transformation of Skewed Data Distributions in Health Care”, Statit Bulletin, January, 2005]

For more information, contact Drs. Robert and Marilyn Hart at robthart@aol.com or (541)412-0425.

If you would like additional information, please send email to statit.support@acs-inc.com. 