The Shewhart p Chart for Comparisons
Robert F. Hart, Ph.D.
Marilyn K. Hart, Ph.D.
An important question regarding test result
validity is: "Given a positive test result,
what is the probability that this test result
is wrong?" This question is answered by
the positive predictive error rate, the number
of false positive test results divided by the
total number of positive test results. The case
study here comes from Statistical Process Control
for Health Care by Marilyn and Robert Hart,
Duxbury, 2002. Table 1 gives data for evaluating
the positive predictive ultrasound error rate
for seven radiologists in their ultrasound testing
for acute appendicitis. The first three radiologists
are radiology specialists; the last four are
nonspecialists. Table 2 summarizes these results
for the two groups.
Table 1. Acute Appendicitis: Positive Predictive
Errors by Radiologist
Radiologist 
False Positives 
Total Positives 
1

1

8

2

4

12

3

2

8

4

3

5

5

5

7

6

2

6

7

2

4

Table 2. Acute Appendicitis: Positive Predictive
Errors by Radiologist Group
Radiologist Group

False Positives

Total Positives

Specialists 
7

28

Non specialists 
12

22

One might make a p chart with seven subgroups
to compare the radiologists and/or with two
subgroups to compare the groups. It is common
for timeordered control charts to have 25 subgroups,
in which case the common 3sigma limits are
appropriate. However, for fewer subgroups, 3sigma
limits are too wide to be effective. Recommended
values of T for Tsigma limits are given in
Table 3.
Table 3. Process Evaluation for Specialcause
Variation: Recommended Values of T for Tsigma
Limits.*
# of subgroups

T

2

1.5

34

2.0

59

2.5

1034

3.0

35199

3.5

2001500

4.0

* The tabular values of T may be used for
the usual case of "no standard given"
with all attribute and variables charts.
A common problem with p charts is that the
subgroup sizes are too small to give valid results.
The required minimum subgroup size is a function
of pBar. For a point to be accepted as valid,
a minimum subgroup size of 1/pBar is needed
and for a point above the upper control limit
to be accepted as valid, a minimum subgroup
size of 4/pBar is needed. Here the pBar for
the error rate is 0.38 (whether one subgroups
the data by radiologist or by radiologist group).
The p chart subgrouped by radiologist (not
shown here) shows that the 7 points all fall
below the 2.5sigma upper control limit with
the subgroup sizes all adequate (i.e., exceeding
the minimum requirement of 1/0.38 = 2.63 rounded
up to three). With only this analysis one would
be conclude that specialcause variation between
radiologists was NOT DEMONSTRATED. This does
not mean that no specialcause variation existed
 only that a more powerful method of subgrouping
was needed to flush out the lack of statistical
control.
Figure 1 is a p chart with 1.5sigma limits
profiling the two radiologist groups on their
positive predictive error rate. This chart illuminates
the superior performance of the specialists,
which should be no surprise. It should be noted
that the outofcontrol condition is to be accepted
as valid since the minimum required subgroup
size of 4/0.38 = 10.53 is met.
Figure 1. p Chart Profiling Radiologist Group
on Positive Predictive Error Rate, 1.5sigma
Limits
The p chart on individual radiologists had
insufficient power to detect any specialcause
variation. By subgrouping the seven radiologists
into specialists and nonspecialists, the larger
subgroup sizes provided the increased power
needed to detect the specialcause variation.
Should only specialists perform the ultrasound
tests?
For more information, contact Drs. Robert and
Marilyn Hart at robthart@aol.com
or (541)4120425.
