As water networks age, many utilities are faced with rising water main break rates and insufficient replacement funds.  Machine learning is a promising tool to support efficient water pipe replacement decisions.  This thesis explores the practical application of machine learning for water pipe failure prediction using a dataset of over 10 million pipe-year records from four countries.  Analysis of predictive factors shows that length, age, diameter, material, and failure history are each significant.  Two novel relationships with break rate are observed: with respect to diameter, an inverse linear relationship, and with respect to age a peak at around 40 years followed by a decline lasting several decades.  A method is presented for predicting both probability of failure and the expected number of failures in a given pipe and time period.  By inferring units, encoding categorical features, and normalizing for different utility practices, it is proposed that a single model can generalize across utilities, geographies, and time periods without any utility-specific data cleansing.  The model is trained and tested on a leave-one-utility-out basis, with training data from time periods strictly prior to test data.  The resulting Area Under the Curve for the Receiver Operating Characteristic of over 0.85 and Cumulate Lift at 10% of over 5.0 demonstrate the practical applicability of the model, matching the performance of models trained and tested on each utility’s own data. Within this model, a method of cross-encoding categorical features with numerical features is introduced to enable integration of data sets from diverse contributors. The applicability of these performance metrics and model outputs to common utility water main replacement decision making processes is also shown.


Kevin Laven, PhD candidate in Systems Design Engineering

Join online via Microsoft Teams

Attending this seminar will count towards the graduate student seminar attendance milestone!