What is normal and how normal does a distribution
need to be for a control chart to be effective?
The answer, surprisingly, may be not very.
In this article, I will present two avenues
of looking at this issue: first, looking at
reports of work on effect of skewness and kurtosis
on the control chart parameters and second,
looking at the t sigma limits in relation to
Shewharts early work.
Wheeler (2004) refers to Burrs work with
skewed distributions in which Burr found that
the theoretical value of d2 and d3
for these distributions did not differ significantly
from those of a normal distribution of skewness
= 0 and kurtosis = 3. As summarized by Wheeler,
Burr indicated that highly-skewed and long-tailed
distributions could be analyzed with a control
chart using the normal values of
d2 and d3. These parameters,
calculated for each of these distributions,
did not significantly differ from the values
of d2 and d3 for the normal
distribution with a skewness of 0 and kurtosis
of 3. Wheeler reproduces several of Burrs
distributions as well as graphs of the different
d2 and d3 values for the
skewed distributions. The charts indicate a
less than significant difference.
Ryan (2000) also discusses Burrs work
and states, regarding Burrs tabulation
of constants, These constants, however,
simply facilitate the construction of 3-sigma
limits in the presence of nonnormality. The
resultant limits are not probability limits
and the probability of a point falling outside
the limits will, in general, be unknown.
However, the general sense here is that even
highly skewed distributions could be effectively
analyzed with a standard calculation of a control
chart.
When Shewhart (1931, Chapter XIV) presented
the control charts he used the Tchebycheff inequality
in his discussion of identifying out-of-control
processes. The Tchebycheff inequality states
that with any distribution (or more generally
any set of numbers), at least 1 - (1/t2)
of those numbers must fall within the limits
of µ±tσ. Shewhart
actually discussed the use of the more conservative
Camp-Meidell Inequality, but based his work
on the Tchebycheff inequality. Other authors
as well have discussed these inequalities as
foundations of control chart work.
Grant and Leavenworth (1990, pg. 106-108) discussed
the Tchebycheff inequality in their Chapter
3: Why The Control Chart Works: Some Statistical
Concepts. Grant and Leavenworth presented
a table that shows the number of cases that
will fall outside the ts limits for several
values of t against a roughly normal
distribution, the Camp-Meidell inequality and
the Tchebycheff inequality. For example, for
3σ, ~0.27% of the points will be outside
the limits for the roughly normal distribution
while the Tchebycheff Inequality states that
no more than ~11.1% of the cases will fall outside
the limits.
Grant and Leavenworth also talk of the Camp-Meidell
Inequality in the same section. This inequality
states that, for certain distributions, at least
1 - (1/2.25t2) of the points
must fall between the µ±tσ
limits. The constraints on these distributions
are that the distribution be unimodal, that
the mode be the same as the mean and that the
frequencies must decline continuously
on each side of the mode. Grant and Leavenworth
state that (m)any of the distributions
that are not normal actually come close enough
to meeting these conditions for the Camp-Meidell
inequality to be applied with confidence.
For the same 3σ as above, this inequality
tells us that no more than ~4.9% of the cases
will fall outside the limits.
Florac and Carleton (1999) also discussed both
inequalities and stated that When we couple
these empirical observations with the information
provided by Tchebycheffs inequality and
the Camp-Meidell inequality, it is safe to say
that 3-sigma limits will never cause an excessive
rate of false alarms, even when the underlying
distribution is distinctly nonnormal.
Florac and Carleton are referring to Wheelers
Empirical Rule.
Both inequalities give us only the bounds of
the probability but with no indication of what
the actual probabilities are. We know from Tchebycheff
that for 3 sigma at least 1 - (1/t2)
=~.899 of the values must fall within ±3σ.
The actual proportion could be higher, but we
just dont know what that proportion is.
Ryans comment quoted above certainly applies
here as well.
The function of control charts is to indicate
the possible presence of assignable cause variation.
With a truly normal distribution, the probability
that a point is outside of the control limits
due to chance causes is much less then for a
point outside the control limits in a distribution
where only Tchebycheff or Camp-Meidell inequalities
apply. We can assign a probability for a normal
distribution if we know
and σ. However, if we estimate these parameters,
our confidence in this probability suffers.
Control charts can still be effective for even
nonnormal distributions; Shewhart showed this.
The knowledge that the probability of a Type
I error (deciding that the universe has changed
when in actuality it hasnt) may be greater
because of a less than perfect distribution
may temper the level of your efforts to find
assignable causes. For 3σ limits, up to,
but no more than, ~11.1% of your points will
show as out-of-control because of chance causes.
Without knowing the actual distribution, you
dont know the probability that you may
be chasing a chance cause of variation. But
since we never have a perfect distribution and
because we estimate the parameters of the distribution,
we never really know what that probability is.
References: