Summarized Data: Choosing Appropriate Charts
Circumstances sometimes arise that require
the use of summarized data as input to control
charts. Once data are summarized, information
critical to control charting may be lost or
obscured. This discussion is not intended to
discourage the use of control charts for summarized
data. Instead, the goal is to present issues
associated with the data that must be taken
into consideration in order to have confidence
that the resultant control chart is, in fact,
valid.
The existence of control charts recognizes
the fact that the processes being monitored
have inherent variability. When data are summarized,
the characterization of that variability may
not be available. For example, a dataset that
contains an average value or a ratio without
a measure of dispersion or number of observations,
limits our ability to characterize the variability
of the data. However, it is acceptable practice
to determine normality from summarized values.
In fact, summarized values may more closely
approximate a normal distribution than the raw
data.
Let us explore the possibilities of working
with attribute data that is summarized as a
ratio. If the available values are normally
distributed, then an Individuals chart can be
considered. The issue with this choice is that
the control limits may be more conservative
than those calculated from the raw data. Control
limits that are more conservative than the actual
control limits of the process could result in
control violations occurring that are truly
part of the normal variation of the process
rather than an indication of the process being
out of control. If the data represents the occurrence
of an event, e.g. patient falls or medication
errors per 1000 patient days, then the Individuals
chart is a viable choice.
Another alternative is the Pchart. The Pchart
does not require normally distributed data or
equal subgroup sizes. If the actual subgroup
size is available, the ratio can be plotted
as the "data variable" and the subgroup
variable used for the subgroup size. This will
more accurately reflect the possible variability
of the data in the control limits. If the subgroup
size is not known, choose an appropriate subgroup
size, e.g. 100, 1000, as representative of the
ratio denominator. Depending on the actual subgroup
size, the resultant Pchart may have control
limits that are wider than would otherwise occur
using the nonsummarized data. Wider control
limits may mask control violations that would
occur with more realistic limits.
If the data in question are variable data,
the data are normally distributed and the data
are already summarized into group means, ranges
and/or standard deviations, then XBar, R and
S charts can be used. Summarized data can be
used in variable charts by selecting appropriate
options in the dialogue boxes. The following
examples illustrate how to generate XBar, R
and S charts using these data.
The first example generates an XBar and R
chart. The data are summarized for each diagnosis.
The data needed to produce the chart are the
average PBD values, range values, subgroup designations
and subgroup sizes. The dialogue to produce
the chart begins the same as a chart using raw
data. Within the Xbar chart dialogue, select
Avg_PBD as the data variable, choose Subgroup
Size using a variable and select Cases. The
important variation in this chart is to select
the Summarized Data button. Check the box labeled
Data are summarized and provide the variable
containing the ranges or standard deviations.
In this case, the subgroup sizes are reasonably
small, so we are going to use the ranges in
the PBD_Ranges variable. This dialog is shown
in Figure 1.
Figure 1
The control chart is displayed in Figure 2.
Since the subgroup sizes are not constant, the
control limits are not constant. Because of
this, adding the values of the control limits
to the chart do not provide helpful information.
A more useful alternative is to add variables
to the data tips. In this example, it is advantageous
to add the range, subgroup size (n), upper control
limit and lower control limit to the default
data tips for each point.
Figure 2
The accompanying R chart for this data requires
that the user check the Data are summarized
box under the Summarized Data button. An example
of this dialogue can be seen in Figure 3. It
is helpful to include the subgroup size and
control limits in the data tips for this chart
as well.
Figure 3
The resultant R chart is displayed in Figure
4.
Figure 4
The data set in the following example has larger
subgroup sizes. Instead of a variable with the
subgroup ranges, this dataset contains the subgroup
standard deviation as shown in Figure 5. When
entering the variable, there is no option to
specify the type of value being passed to the
control chart.
Figure 5
By default, control limits for Xbar charts
are calculated using subgroup range values.
However, the use of range values is valid as
long as the subgroup sizes are between 2 and
30, inclusive. This dataset uses the Regional_Cases
variable for subgroup sizes. These values are
well beyond the allowable subgroup size. Attempting
to produce an Xbar with the current selections
would result in an error as shown in the example
in Figure 6. Note that this is basically the
same dialogue that was used to generate the
Xbar chart in Figure 3, even though this data
set has much larger subgroup sizes and sample
standard deviation values instead of the range
values. The software recognizes that the subgroup
sizes are not compatible with the choices that
have been made up to this point. It is, therefore,
necessary to specify that the control limits
be calculated using the subgroup standard deviations.
Figure 6
The flag to specify that the control limits
are to be calculated based on standard deviations
instead of ranges is found in the dialogue in
the Control Limits button. The dialogue is shown
in Figure 7.
Figure 7
As discussed in the previous series of charts,
the control limits vary. The subgroup sizes
and control limits are added to the plotted
data tips. The Xbar chart is displayed in Figure
8.
Figure 8
Generating the S chart uses similar choices
as the R chart. The user specifies the variable
containing the standard deviation values and
checks the summarized data box under the Summarized
Data button as illustrated in Figure 9.
Figure 9
The selection of additional data tip variables
produce the final S Chart in Figure 10.
Figure 10
It is important to have as much knowledge about
your data as possible. When confronted with
summarized data, investigate the possibility
of getting more detail, such as subgroup size.
Identify what is known and not known about the
data and how the charts could be affected. Finally,
formulate a realistic idea about what can be
expected from the chart. If the results differ
from expectations, consider a different strategy
or reevaluate the validity of the data.
