Rare Events in Manufacturing


As I travel around to different manufacturing sites, I often see signs that say something like, "45 days without a lost time accident". But what do these signs tell us? From the standpoint of improving our accident rate, not a lot. There is no history and no idea if we are improving or getting worse. But what happens if we can monitor this process with a Statistical Process Control Chart? Then we could be on the road to process improvement.

Events like lost-time accidents and chemical spills are (hopefully) rare events in the manufacturing plant. As such, they are difficult to analyze with the standard Shewhart charts, such as c, p, or u chart as you can see from the following example of a c chart on chemical spills.


Figure 1

Looking at this chart, we really can’t tell what our process is doing. We are looking at the number of spills in a month time frame. Obviously, this chart will not be of much use to us. Even if we increased the time frame to quarters, we would not have a chart that we could really use for decisions on our processes. And we would not have timely information for detecting quickly that something was going wrong. If you explore this with other charts, such as the p, u or even the i chart, you will find the same situation; no real information that you could use for decisions.

With this chart we are looking at a somewhat arbitrary subgroup (month) and we would need more spills in the month to get a signal. However, suppose we had spills that were grouped around the end of the month. We may have had a cluster of spills that we would like to know about but because of the arbitrary monthly subgroup, we might not detect it.

In the manufacturing environment, we may have a number of events that happen very rarely, and we would like to keep it that way. Some examples of these events might be things like lost-time accidents, chemical spills, quality events or perhaps unexpected downtime. Or perhaps we would like to look at rare, but critical, defects. In a high yield manufacturing process, we would not have a lot of defects, so this may be an area we could talk about as well as rare events.

These type of data are what can be called rare events. How do we spot rare event data?

First, the definition of rare events could be functional. Rare events are basically those events that are so infrequent that they do not fit the models of standard SPC charts. The data are not Poisson, Binomial or Normal. Perhaps the data are so infrequent that we cannot get a good feel for the type of distribution; we would not have enough data to model the distribution.

You could think of rare events as those with very few occurrences over a relatively long time period. For chemical spills, it might be some small number of events in a year. The same might be true of lost time injuries or unexpected down time. For high yield manufacturing, we may have very few of the critical defects such that we have several batches with no defects, followed by a batch with one critical defect. In this case, the time frame would be shorter than the previous examples but relatively long in terms of the batches. This would be a situational determination of the definition of rare events.

By standard SPC, we would need to decide on the subgroups size. Often this would by based on some time period over which we are counting rare events. For the spills we might count the number of spills in a month again with the caveat of arbitrary subgrouping. For the batch situation, we might look at defects per day of production. But we may also look at defects per batch as our subgroup.

But let’s try to put some decision criteria to whether we have a rare event situation. Basically, we could consider those infrequent events where the assumptions of the standard SPC chart are not met. Rare events are those that:

  • Happen so infrequently that they are too discrete for a c/u chart where the average count is < 1
  • We have to wait so long for data to satisfy the requirements for a chart that we cannot use it to make timely and rational decisions. For the p chart (np>= 5). If we had a p of .01 fraction failed, we would need a subgroup size of 500 to satisfy this guidance. If we wanted at least 25 subgroups to chart we would need 12,500 data points. This could take a really long time to collect that much data.

We need to keep the subgroup time frame fairly reasonable, otherwise we would need to wait too long to make the decision. And if we use a large denominator for a p or u chart with a small numerator, we could get into another issue called overdispersion.

In the standard charts, we don’t really have good options to analyze these data. So how do we monitor these data if standard SPC charts do not do a good job of this?

As an example, let’s look at chemical spills data used by Wheeler as illustrated in the chart in Figure 1. There were eight spills over four years. A c chart on this data is shown in Figure 1. As we’ve discussed, this is not a helpful chart. We cannot make good decisions based on this type of information. We don’t know if we are getting more spills or not. And it would not be different with other standard SPC charts either.

Why is this? Well, we are looking at count data and we are looking at a subgroup of one month. As we can see from this chart, the average spills per month is less than 1 at .151. When we are looking at count data with an average count of less than 1, the count charts become ineffective. At that point, the data is “too discrete”, as we can see from this example.

Notice the relation of the single spill to the average. It is almost 7 times the average but still does not show as an out of control point. This illustrates that the c, u, p and NP charts become very insensitive for these type of data. Statistically, it does not fit the c/u chart model of the Poisson distribution.

While not shown here, there are similar issues with other charts. For example, data for the p chart should be such that NP>= 5. If we have a p of .01, that would mean that we would need 500 points per subgroup. If we were trying for 20-25 points on a chart, we would need something like 12,500 total data.

However some alternatives have been proposed.

Rate on an XmR Chart

The first one we will explore is discussed in Wheeler’s Advanced Topics in Statistical Process Control and uses the spills data.

This method calculates the days between spills and converts it to a rate of spills per year, a rate that can then be charted with an XmR chart.

You first calculate the number of days between spills. To use this information with usual SPC charts you would then convert this to a rate. For these data, a convenient rate might be Spills per year. So we would divide 1 by the number of days between to arrive at Spills per day. Multiply that by 365 to get Spills per year. This information could be then displayed with an XmR chart.

You can see my method of calculation at live.statit.com under the Rare Event Webinar. Choose the Calculate and XmR macro. In the bottom of the display window is a View Macro Source link that will show you the Statit language to build this chart.

With the XmR chart we would certainly like to have the rate as low as possible. A lower rate translates to a larger number of days between events.

Java is not enabled in browser, data tips cannot work for this graph.
Figure 2
(mouse over any data point to see tips)

Depending on the Spill rate, you may want to adjust this chart to get something that is more meaningful to the viewers by using a different time frame. For example, is a rate of 1.82 spills per year meaningful?

Note that it would also be possible to calculate the batches between and then calculate the rate as defects / batch rate. We could also calculate parts between.

We use two charts. Since we are using the I chart we often want the companion Moving Range chart. Also remember that the I chart is most effective when we are working with a distribution fairly close to normal. We may be way off base with that assumption.

g Chart

The other alternative is what is called the g chart. This chart has been increasingly used in healthcare, where they need to monitor rare events like Surgical Site Infections and other events known as Never Events, those that should never happen. But it was originally designed for Dr. James Benneyan for manufacturing applications. The article in the references at the bottom of this presentation contains the justification of the g chart. It has proved very effective in healthcare data.

The idea of the g chart is that these types of events are more closely modeled on the geometric distribution. The geometric distribution is the memory-less discrete distribution. To give you an idea of a geometric distribution, suppose you play a game where you toss a coin until a head comes up. The number of tosses is X. If you repeat this game several times recording X each time, the distribution of this random variable is the geometric distribution. The geometric distribution is the discrete version of the exponential distribution. This distribution would model the coin flips until or between heads.

Often with the data we are talking about (8 spills in 4 years), it may be difficult to ascertain that we are looking at a geometric distribution, simply because we do not have enough data. This is approximately what a histogram of a geometric distribution would look like.


Figure 3

Obviously it would take a large number of spills or accidents to show that it is geometric.

The g chart measures time between events or occurrences between events. The control limits are based on the same principles as other charts, but are calculated based on the properties of the geometric distribution. So we might look at something like Days Between or Hours Between or Batches Between. A g chart could also looks at occurrences between, such as parts produced between defects. This is an example of the g chart.

Java is not enabled in browser, data tips cannot work for this graph.
Figure 4
(mouse over any data point to see tips)

Notice that with the g chart, we want the days between to be higher. So for the g chart, you see the Good Direction at the top. And that is an important distinction between the XmR chart of the first alternative and the g chart alternative. With the XmR chart, we are looking at the rate and we want a lower rate for the Rare Event. We want fewer spills per year. But with the g chart, we want more days between spills. Notice in this case that we are not getting a signal for this process as we did with the XmR chart. We are seeing a steady decline, but we need one more decreasing point for a signal.

Again there are examples of the g chart with other data on our live.statit.com site under the Rare Event Webinar.

The g chart is simple to implement. Your data can be in the form of dates or datetime that an event took place. Or it can be a column of days or time between individual events. Or a column of the counts of events between adverse events.

The g chart also has the many options available to other Statit SPC charts including phases, reference lines, Assignable Cause points, etc.

G charts only need one chart to analyze the process. Generally with the g chart, the lower control limit does not come into play and is bounded by zero.

Notice on this chart that I have displayed the number of days since the last spill. This number is an indication of how the process is behaving since the last spill. It would be important to know how we are doing in relation to that last event but until we get the next spill, we cannot calculate the number of days between. However, I have color-coded the subtitle such that it is red if the days since the last spill is less than the mean, yellow if it is between the mean and the Upper Control Limit, and green if it is above the Upper Control Limit.

XmR on Rate

g Chart

Double Charts
Single Chart
Down is good
Up is Good
Data Manipulation
Ease of Use (Simple Data)

So for a g chart, higher is better. We only need one chart because one parameter describes the distribution. The data is simple, but the Lower Control Limit has little use on the g chart. So to compare the two alternatives, we see that major differences are in the interpretation and the simplicity of the data. Lower is better on XmR and higher is better on g chart. XmR may require data calculation and manipulation. G chart is simple to implement with simpler data. XmR chart may require analyzing two charts. G chart does not give us much information with the Lower Control Limit.

Now, suppose we want this information to be displayed to the organization. With Statit e-QC, you have the ability to schedule reports on a frequent basis to give you up-to-date information. And, you can access these reports from your intranet. So you could publish a scheduled report as shown on live.statit.com and link it to your intranet so that everyone knows how you are doing currently and how that compares to the past and whether or not you are getting better. This, of course, give a better indicator than just a simple sign.

Of course, there are other alternatives such as transformations or Cusum charts. But I find that these charts are easier to interpret for most users.

References:

•Benneyan, J.C. (2001). Number-Between g-Type Statistical Quality Control Charts for Monitoring Adverse Events. Healthcare Management Science, 4.
•Wheeler, D.J. (2004). Advanced topics in statistical control – The power of Shewhart’s charts (2nd ed.). Knoxville, TN: SPC Press.

If you would like additional information, please send email to statit.support@acs-inc.com.