Estimating Standard Deviation
Q: I need an estimate of the standard
deviation of the weights of a production lot
of large and bulky parts, but I don't have
time or facilities to measure a bunch of parts.
Is there any "quick and easy" way
to estimate the standard deviation?
A: You can use some properties of the
normal distribution to help you easily estimate
the standard deviation of a population. This
approximation will be pretty good as long as
the underlying distribution is fairly normal,
as in many situations with process data.
The normal distribution is the familiar, moundshaped
curve from the Statware logo. The normal distribution
is frequently found in process data, where the
majority of measurements cluster around a central
location, the mean, and the number of measurements
observed decreases as the distance from the
mean increases. A normal distribution curve
is shown below (from Montgomery, Introduction
to Process Quality Control).
As shown in the figure, 68.26% of the observations
of a normal population will be found within
1 standard deviation of the mean. 95.46% of
the observations will be found within 2 standard
deviations, while 99.73% will be found within
3 standard deviations. Thus, almost 100% of
the observations will be observed in a span
of six standard deviations, three below the
mean and three above the mean. (This is why
process capabilities are calculated by dividing
the process spread by six.)
To estimate the standard deviation of a population,
first determine the largest observation that
would be expected to be observed. This should
be a real measurement, as opposed to a reading
that could be observed due to measurement error.
Some processes include a value called the upper
reasonable limit that could be used. Next, determine
the smallest observation that would be expected
to be observed or use the lower reasonable limit.
The estimate of the standard deviation can
now be calculated using:
While this estimate is not as reliable as an
estimate based on calculations over a large
number of parts, it is often useful as a preliminary
estimate. Often, the estimate is calculated
by dividing by 4 instead of 6, especially if
good information is not known about the largest
and smallest possible observations. This estimate
would only assume that there may be larger or
smaller possible observations which are unknown.
The use of 4 in the denominator would tend to
produce a more conservative estimate of the
standard deviation.
