Discussions on Normality
What is normal and how normal does a distribution
need to be for a control chart to be effective?
The answer, surprisingly, may be "not very".
In this article, I will present two avenues
of looking at this issue: first, looking at
reports of work on effect of skewness and kurtosis
on the control chart parameters and second,
looking at the t sigma limits in relation to
Shewhart's early work.
Wheeler (2004) refers to Burr's work with
skewed distributions in which Burr found that
the theoretical value of d_{2} and d_{3}
for these distributions did not differ significantly
from those of a normal distribution of skewness
= 0 and kurtosis = 3. As summarized by Wheeler,
Burr indicated that highlyskewed and longtailed
distributions could be analyzed with a control
chart using the "normal" values of
d_{2} and d_{3}. These parameters,
calculated for each of these distributions,
did not significantly differ from the values
of d_{2} and d_{3} for the normal
distribution with a skewness of 0 and kurtosis
of 3. Wheeler reproduces several of Burr's
distributions as well as graphs of the different
d_{2} and d_{3} values for the
skewed distributions. The charts indicate a
less than significant difference.
Ryan (2000) also discusses Burr's work
and states, regarding Burr's tabulation
of constants, "These constants, however,
simply facilitate the construction of 3sigma
limits in the presence of nonnormality. The
resultant limits are not probability limits
and the probability of a point falling outside
the limits will, in general, be unknown."
However, the general sense here is that even
highly skewed distributions could be effectively
analyzed with a standard calculation of a control
chart.
When Shewhart (1931, Chapter XIV) presented
the control charts he used the Tchebycheff inequality
in his discussion of identifying outofcontrol
processes. The Tchebycheff inequality states
that with any distribution (or more generally
any set of numbers), at least 1  (1/t^{2})
of those numbers must fall within the limits
of µ±tó. Shewhart
actually discussed the use of the more conservative
CampMeidell Inequality, but based his work
on the Tchebycheff inequality. Other authors
as well have discussed these inequalities as
foundations of control chart work.
Grant and Leavenworth (1990, pg. 106108) discussed
the Tchebycheff inequality in their Chapter
3: Why The Control Chart Works: Some Statistical
Concepts. Grant and Leavenworth presented
a table that shows the number of cases that
will fall outside the ts limits for several
values of t against a "roughly normal"
distribution, the CampMeidell inequality and
the Tchebycheff inequality. For example, for
3ó, ~0.27% of the points will be outside
the limits for the roughly normal distribution
while the Tchebycheff Inequality states that
no more than ~11.1% of the cases will fall outside
the limits.
Grant and Leavenworth also talk of the CampMeidell
Inequality in the same section. This inequality
states that, for certain distributions, at least
1  (1/2.25t^{2}) of the points
must fall between the µ±tó
limits. The constraints on these distributions
are that the distribution be unimodal, that
the mode be the same as the mean and that the
"frequencies must decline continuously
on each side of the mode". Grant and Leavenworth
state that "(m)any of the distributions
that are not normal actually come close enough
to meeting these conditions for the CampMeidell
inequality to be applied with confidence."
For the same 3ó as above, this inequality
tells us that no more than ~4.9% of the cases
will fall outside the limits.
Florac and Carleton (1999) also discussed both
inequalities and stated that "When we couple
these empirical observations with the information
provided by Tchebycheff's inequality and
the CampMeidell inequality, it is safe to say
that 3sigma limits will never cause an excessive
rate of false alarms, even when the underlying
distribution is distinctly nonnormal."
Florac and Carleton are referring to Wheeler's
Empirical Rule.
Both inequalities give us only the bounds of
the probability but with no indication of what
the actual probabilities are. We know from Tchebycheff
that for 3 sigma at least 1  (1/t^{2})
=~.899 of the values must fall within ±3ó.
The actual proportion could be higher, but we
just don't know what that proportion is.
Ryan's comment quoted above certainly applies
here as well.
The function of control charts is to indicate
the possible presence of assignable cause variation.
With a truly normal distribution, the probability
that a point is outside of the control limits
due to chance causes is much less then for a
point outside the control limits in a distribution
where only Tchebycheff or CampMeidell inequalities
apply. We can assign a probability for a normal
distribution if we know
and ó. However, if we estimate these parameters,
our confidence in this probability suffers.
Control charts can still be effective for even
nonnormal distributions; Shewhart showed this.
The knowledge that the probability of a Type
I error (deciding that the universe has changed
when in actuality it hasn’t) may be greater
because of a less than perfect distribution
may temper the level of your efforts to find
assignable causes. For 3ó limits, up to,
but no more than, ~11.1% of your points will
show as outofcontrol because of chance causes.
Without knowing the actual distribution, you
don't know the probability that you may
be chasing a chance cause of variation. But
since we never have a perfect distribution and
because we estimate the parameters of the distribution,
we never really know what that probability is.
References:
Florac,
W.A. & Carleton, A.D. (1999). Measuring
the Software Process: Statistical Process Control
for Software Process Improvement. Boston: AddisonWesley
Grant, E.L. & Leavenworth, R.S. (1999).
Statistical Quality Control 7th Ed. Boston:
McGrawHill.
Ryan, T. P. (2000). Statistical Methods for
Quality Improvement 2nd Ed. New York: John Wiley
& Sons, Inc.
Shewhart, W.A. (1931). Economic Control of Quality
of Manufactured Product. New York: D. Van Nostrand
Company, Inc
Wheeler, D.J. (2004). Advanced Topics in Statistical
Control – The Power of Shewhart's
Charts (2nd ed.). Knoxville, TN: SPC Press.
