Generate a Reasonable Failure Curve

If the only reliable failure data available is the experience and expertise of the utility's personnel, it is still possible to estimate a reasonable probability of failure curve. This page describes a simple technique that can be used.

Underlying probability distribution - Probability density function (PDF)

Let's say that the asset in question is office chairs. In a workshop setting, the engineering staff estimates that if you buy 100 chairs and put them into use on the same day, half will have failed by the end of year ten. In other words, the mean time to failure of office chairs is ten years. (Enter 10 in the yellow cell labeled Mean.)

Since the chairs don't all fail at ten years old, there must be some variance in the time to failure. Let's say in our case that the engineers further estimate that half the chairs will fail between ages eight and twelve. This establishes the width of the probability distribution. Enter 8 in the yellow cell labeled Spread, min. This defines the lower bound of the failure range we are defining. The workbook automatically calculates 12 for the upper bound. This is because we are assuming that this is a normal distribution, which is symmetrical about the mean. There's no particular reason the distribution has to be normal, but since we have so little data it makes sense to choose a probability distribution that is easy to work with. Next, enter 50%, corresponding to the fraction of the population expected to fail between ages eight and twelve, in the range cell.

The workbook automatically calculates the bell curve for this distribution, in column C. This is called the Probability Density Function, and it's shown below. This distribution indicates the number of failures we would expect year by year from our initial population of 100 office chairs. You can see that the failures are clustered around the middle; not many chairs will fail in the first few years when they are in good condition, or in the later years, when most of them have failed already and there aren't many left to fail.

By age 20, we expect that virtually all the chairs will have failed; therefore, a failure of a 20-year-old chair is an unlikely event, since a 20-year-old chair is itself unlikely. However, for the asset tool we are interested in knowing the likelihood of failure of a 20-year-old chair (or any other age) given that it has already reached age 20.

Survival curve

From the PDF, we can calculate the survival curve, which tells us what percentage of the population of chairs is expected to still be in service as of a given age. (This is 1-CDF, where the CDF is the cumulative density function, the integral of the PDF.) For example, at age zero, all 100 chairs are in service – as you would expect. For the first few years, there is little change, then somewhere around age five the population starts to drop off. At age ten, half the chairs have failed (note that this is the mean of our bell-shaped PDF above), and by about age 20 virtually none are left.

This is an interesting bit of information, but it still does not give us the failure probability curve we are looking for. There is one more step.

Calculating the failure probability curve

Remember that the failure probability curve is tells us the probability that an asset (i.e., a chair) will fail in a particular year given that it has already made it to that year. Another way to think of this is, Of the chairs that make it to age n, what percentage do I expect to fail before age n+1? This percentage is simply the probability density function (i.e., the number of failures at any given age) divided by the survival curve (i.e., the number of assets we expect to be still be in service at any given age). This calculation is shown below.

This is the failure probability curve the asset tool is looking for. If we know the consequence of failure for our chairs, we can now calculate the risk cost for any age chair, and we can optimize the replacement timing by minimizing life-cycle cost.

A Few Questions

Does it matter if the hazard rate goes above 100%?

It depends on the asset and the failure modes you have defined. For an asset like a wood pole, only one failure is possible over the life of the asset, so you might limit failure probability at 100%. On the other hand, an asset like underground cable could have more than one failure in a year, so a failure probability greater than 100% could be interpreted as an expectation of more than one failure. To limit the failure probability, simply look for the age at which the probability reaches 100%, and force the probability to 100% for all subsequent ages. This can be done in the asset tool.

My hazard rate goes crazy after a few years, and finally ends in a series of error messages. What's the deal?

When the PDF and the survival curve get very small, Excel has trouble calculating the ratio properly, so you can get errors. Fortunately, this generally happens well past the economic life of an asset. It is usually reasonable to simply cap the failure probability at 100% and keep it there for higher ages. This problem can be resolved by using a Weibull rather than a normal curve, because the hazard rate of a weibull curve is an exponential function of age (i.e., K x Ageb).

What if the PDP does not go to zero at the left end of the graph?

This happens when the standard deviation is not small relative to the mean. In most cases, you can simply ignore it; the hazard rate will be calculated just fine. To test this, try entering 10 for the mean and 4 for the min spread (i.e., the lower bound). This is equivalent to saying that the mean life is 10 and half the assets are expected to fail between ages 4 and 16. If you do this, you will see that the PDF is fairly high when it crosses the Y-axis, but there's nothing obviously wrong with the failure probability curve.

Ignoring this effect is probably not defensible from a theoretical perspective. But keep in mind that the approach described here is just a way to get a reasonable failure curve on the short term, while data is collected and analyzed to create curves using a more rigorous methodology.

Why use a normal curve? Couldn't the probability distribution be skewed to one side or the other?

Yes, it could be. But if there's not enough data to make a curve analytically, there's surely not enough data to justify choosing one bell-shaped distribution over another. The normal curve has to significant advantages: 1) It requires only two parameters to fully define, and 2) those parameters have intuitive meaning.

Any other words of advice?

You bet:

  • We all tend to overestimate our ability to predict failure. Take a skeptical view of the spread of your distribution. If the standard deviation is less than a third to a quarter of the mean, it's probably too tight. Another way to look at this is that probability of failure is usually flatter than we think it is.
  • You might wonder whether curves based on this methodology are any good at all. Well, compared to what? If your option is between a curve created this way or one based on a rigorous statistical analysis, you should take the rigorous one. But if it's between this one and nothing, this is better. Remember that this curve is based on the collected expertise of your utility's personnel. There's a lot of information captured in that.
  • Even when you have data and you fit curves to it, you should still use the approach described here as a sanity check. Also, consider using a Bayesian approach to combine new curves with these. See also calibrating failure curves for suggestions about combining expert judgment and limited data.

Continue to health index.


Fatal error: Allowed memory size of 201326592 bytes exhausted (tried to allocate 16 bytes) in C:\Inetpub\vhosts\\httpdocs\wikis\bis\inc\auth\plain.class.php on line 281