Predictive maintenance: Weibull analysis with Python

You need to predict how many components will fail in the next months and how many spare parts you have to buy. This problem is statistically described by the Weibull analysis and calculation easily done in Python.

Dr. Marco Berta
4 min readOct 19, 2023

If you work with Predictive Maintenance soon or later you will get across s funny exponential equation, named “Weibull distribution”, named from Ernst Hjalmar Waloddi Weibull (18 June 1887–12 October 1979). Brilliant Materials Engineer, he explored in depth the domain of high cycle fatigue and today we still apply his findings to failure prediction.

The typical form with 2 parameters is

Weibull probability density function

Where “eta” defines the scale and “beta” the shape of the probability function. In other words, for eta, the higher this parameter the faster a failure occurs. Beta instead is a parameter that determines the shape of our distribution. Beta <1 : we have less and less failures with time as the weak part of our population is weeded out. Beta = 1 constant failure rate. Beta > 1: the failure rate increases with time as the population ages significantly.

Weibull cumulative distributions

Weibull analysis in Python using the “reliability” package

A practical example: I have n projector lamps and after a given period I observe that few have burned due to fabrication defects, improper use during movie screening or simply extensive use. I would like to know how many will fail after my observation period.

The lamps for which failure has not occurred (yet) are defined “censored”[2]. For those failed we record the failure times (ex. 749.9 hours) and store them into a list or a Pandas dataframe column. The failure times are our data.

fail_times =  [ 749.9  785.2  366.2 1875.7 1457.3 1060.9  755.2  688.8 1030.6  869.5] 

An open source Python package named “reliability” provides all the functions needed for our fit.

Package imported from Python “reliability” library

analysis = Fit_Weibull_2P(failures=fail_times, right_censored=None, show_probability_plot=True, print_results=True, CI=0.95, CI_type=’time’, method=’MLE’)

The Fit_Weibull_2P package with the above parameters produces as output a summary of the results such as parameters and goodness of fit (“alpha” in this case corresponds to our “eta”),

Weibull fit output parameters

And a fitted cumulative distribution graph

Weibull probability cumulative distribution function

How good does the Weibull distribution fit our data?

When fitting we change the curve shape, i.e we change the probability until we find the best fit. In this case we talk about “likelihood”, and the metric is the log-likelihood rather than accuracy (as for classification) or R squared (regression). Not intuitive at all since commonly “likelihood” and “probability” are used interchangeably. But there is a difference as explained in a video by Josh Starmer [3] or at the link [4]. While log-likelihood is the main metrics used, this has limitation in case of many parameters or little amount of samples. For this reason other metrics have been calculated introducing penalties that take into account these factors. Akaike Information Criterion (AIC) is used with a little amount of samples [5] and Bayesian Information Criterion with large number of samples.

Another metric is the Anderson-Darling test [7] and the AD coefficient measures whether or not our data are distributed according to the Weibull assumption. In this article, I will not go into details but for who is interested an exhaustive list of goodness of fit tests is available in Wikipedia [8].

*************************************************

The code used for this article is public and available here.

*************************************************

References

  1. CQE Academy, RELIABILITY Explained!, https://www.youtube.com/watch?v=BQXnKpP2lrI
  2. https://reliability.readthedocs.io/en/latest/What%20is%20censored%20data.html
  3. StatQuest with Josh Starmer, Probability is not Likelihood. Find out why!!! https://www.youtube.com/watch?v=pYxNSUDSFH4
  4. Aryan Gupta, Likelihood V.s Probability: What’s The Difference?, https://www.simplilearn.com/tutorials/statistics-tutorial/difference-between-probability-and-likelihood
  5. https://builtin.com/data-science/what-is-aic
  6. Yaokun Lin, Model Selection with AIC & BIC, Medium 10/03/2021
  7. https://web.cortland.edu/matresearch/AndrsDarlSTART.pdf
  8. https://en.wikipedia.org/wiki/Goodness_of_fit

--

--

Dr. Marco Berta

Senior Data Scientist @ ZF Wind Power, Ph.D. Materials Science in Manchester University