1 Dpto. Organización Industrial y Gestión de Empresas II, Escuela Técnica Superior de Inge-niería, Universidad de Sevilla (Spain)
2 Cátedra del Agua EMASESA-US
Keywords: Artificial Neural Networks, Pipe failure prediction, Water distribution system.
In this study, an Artificial Neural Network (ANN) is designed to forecast pipe failures in water supply systems. Most pipe failures occur because the poor condition of pipes, which is directly related to the inadequate management of investment in the improvement and maintenance of the infrastructure. The developed model individually classifies pipes into failure/non-failure using several factors related to the design and operation of the network, helping to optimize the replacement plans of companies.
As main contributions, the accuracy of ANNs is evaluated, in particular, the use of a specific machine learning software named Weka . Furthermore, the effectiveness of two sampling methods, under-sampling and over-sampling, are compared.
ANNs are systems that emulate the human brain functioning. Neurons are represented by nodes and nerve impulses by the weighted sum of the input values of each node. The interconnected nodes are organized in layers: (1) the input layer receives the information (input variables); (2) the output layer generates the class (output variable); and (3) the intermediate or hidden layers process the information.
In each layer, it is firstly calculated the weighted sum of the outputs of the previous layer (zj). Then, activation functions f(z) converts the inputs of each node into its output. In this study, the designed ANN have sigmoid activation functions at each node and different number of hidden layers. The learning of an ANN is the adjustment of its parameters (wij), while its structure does not usually vary .
Since we propose a classification system, it is used the confusion matrix as the quality metric to measure the precision of the results. This matrix counts the number of samples that are correctly or incorrectly classified from each class.
A 7-year historical failure database, including various factors that can influence the failure of pipes, is used to evaluate the designed ANN performance. Concretely, material, pipe diameter, age, length, connections per kilometre, network type, pressure fluctuation and number of previous failures are used as factors.
The output variable of our database is totally unbalanced, having 619 failures in 2018 out of 89595 pipe sections. This is a common fact in water supply databases where the number of pipe failures is very small compared to the entire network. For this reason, the study compares the use of two sampling techniques to train the ANN. Under-sampling is randomly applied, while the generation of synthetic instances in over-sampling is done by using a 5-nearest neighbour approach.
Results suggest that training an ANN with a balanced dataset (1:1) through under-sampling implies the system to not properly learn how to distinguish patterns of the majority class (in this case, the non-failure), while the minority class (the failure) is detected with high precision. On the contrary, over-sampling (1:1) makes the system to distinguish with the same importance both classes. Moreover, runtimes increase as the number of hidden layers grows and, they are substantially higher in the case of over-sampling because the training of the algorithm is done with much more data. It is also observed that results improve from 1 to 10 hidden layers, however, this does not hold for 50 and 100. Therefore, given the size of our dataset, the best ANN configuration is one with a number of hidden layers between 5 and 10.
ANN is a promising approach to reduce the number of unexpected pipe failures, which cause many problems to management companies and to the whole society because, in general, these infrastructures are public.
Results demonstrate that the model has an ability to predict pipe failures up to 86.4%. In general, a more accurate pipe failure forecast supposes the worsening of the no-failure predictions. Therefore, the chosen of the model must be done according to the company strategy and budget. Furthermore, the necessity of balancing classes in the training set is confirmed in order to obtain accurate predictions, whereas the test set must be untouched for the results to be realistic.
- Frank, E., Hall, M.A., Witten, I.H.: The WEKA Workbench Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques.” Morgan Kaufmann Publishers Inc. (2016)
- Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient Processing of Deep Neural Networks: A Tutorial and Survey. In: Proceedings of the IEEE. pp. 2295–2329 (2017)