Q: How do I use a normal probability
plot to assess the normality of a population?
A: Probability plotting is a graphical
method for determining whether sample data conform
to a hypothesized distribution, based on a subjective
visual examination of the data. In process management
we are typically concerned about whether the
data is distributed according to a normal distribution,
since many of the statistical inference procedures
that we use require the assumption of normality
of the data.
To construct a probability plot:
| 1. |
The observations are ranked from smallest
to largest, x(1), x(2),
. . ., x(n). |
| 2. |
The ordered observations x(j)
are plotted against their observed cumulative
frequency, typically; (j/(n
+ 1))on a graph with the y-axis appropriately
scaled for the hypothesized distribution.
|
| 3. |
If the hypothesized distribution adequately
describes the data, the plotted points fall
approximately along a straight line. If
the plotted points deviate significantly
from the straight line, especially at the
ends, then the hypothesized distribution
is not appropriate. |
| 4. |
In assessing the "closeness"
of the points to a straight line, the "fat
pencil" test is often used. If the
points are all covered by the imaginary
pencil, then the hypothesized distribution
is likely to be appropriate. |
Example: The following data represents the
thickness of plastic sheet, in microns:
43, 52, 55, 47, 47, 49, 53, 56, 48, 48
|
Ordered data
|
Rank order
(j)
|
Cumulative Frequency
( j/(n + 1))
|
|
43
|
1
|
1/11 = .0909
|
|
47
|
2
|
2/11 = .1818
|
|
47
|
3
|
3/11 = .2727
|
|
48
|
4
|
4/11 = .3636
|
|
48
|
5
|
5/11 = .4545
|
|
49
|
6
|
6/11 = .5454
|
|
52
|
7
|
7/11 = .6363
|
|
53
|
8
|
8/11 = .7272
|
|
55
|
9
|
9/11 = .8181
|
|
56
|
10
|
10/11 = .9090
|
The ordered data is then plotted against its
respective cumulative frequency. Note how the
y-axis is scaled so that a straight line will
result for normal data.

Based on the normal probability plot and using
the results of the "fat pencil" criteria,
it appears that the thickness data is normally
distributed. Thus, using further statistical
tests that require the assumption of normality
is appropriate. Statistical tests based on the
t-distribution and the F-distribution are fairly
robust to minor departures from normality, so
a subjective visual examination of the probability
plot is usually sufficient to use these tests
with confidence.
Statistical Test for Normality
Statit can also perform a Shapiro-Wilk hypothesis
test on the normality of the data for sample
sizes in the range [10,1000]. The null hypothesis
is that the data comes from a normal distribution:
H0: Population is normal
If the p-value is smaller than the critical
value, usually 0.05, H0 is rejected
and we conclude that the population is not normal.
In the above case, the p-value for the test
of normality is 0.246 so we do not reject H0
and we accept that the underlying population
is normal. This is the same conclusion we reached
using the "fat pencil" test on the
probability plot.
Advantages of Probability Plots
|
|
Normal probability plots work well as
a quick check on normality.
|
|
|
Probability plots work well for both
large and small samples, as opposed to
other statistical tests which have more
limited ranges of sample sizes. For example,
Shapiro-Wilk can usually only be used
for sample sizes in the range [10,1000],
while goodness of fit tests, such as the
Chi-Square test, usually require at least
50 100 observations for meaningful
tests.
|
|
|
Probability plots help us
investigate the normality of residuals from
regression or ANOVA models. Residuals are
not independent of each other since they
are calculated from the underlying model
that was fit to the data. However, observations
must be independent to use the other statistical
tests of normality. |
|
|
Probability plots can be constructed
for distributions other than the normal
distribution. |
Disadvantages of Probability Plots
|
|
People can make different interpretations
of the plots, or use fatter or thinner pencils. |
|
|
Normal probability plots alone do not
yield a p-value regarding the decision.
The Shapiro-Wilk test must be performed
to get a p-value. |
How to Have Statit Construct a Normal Probability
Plot
Select Graphics -> Distribution Plots ->
Probability
Variable: Click on ->, Click on the desired
variable. Click on Close.
When you are ready to draw and view the graph,
choose OK.
NOTE: Normal probability plots are also useful
as process management tools with the addition
of lines at the probabilities associated with
± 3 sigma. For more information on this,
see Probability
Plot Use in QC.