Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods
Engineering, Construction and Architectural Management
ISSN: 0969-9988
Article publication date: 23 June 2022
Issue publication date: 27 November 2023
Abstract
Purpose
Central to the entire discipline of construction safety management is the concept of construction accidents. Although distinctive progress has been made in safety management applications over the last decades, construction industry still accounts for a considerable percentage of all workplace fatalities across the world. This study aims to predict occupational accident outcomes based on national data using machine learning (ML) methods coupled with several resampling strategies.
Design/methodology/approach
Occupational accident dataset recorded in Turkey was collected. To deal with the class imbalance issue between the number of nonfatal and fatal accidents, the dataset was pre-processed with random under-sampling (RUS), random over-sampling (ROS) and synthetic minority over-sampling technique (SMOTE). In addition, random forest (RF), Naïve Bayes (NB), K-Nearest neighbor (KNN) and artificial neural networks (ANNs) were employed as ML methods to predict accident outcomes.
Findings
The results highlighted that the RF outperformed other methods when the dataset was preprocessed with RUS. The permutation importance results obtained through the RF exhibited that the number of past accidents in the company, worker's age, material used, number of workers in the company, accident year, and time of the accident were the most significant attributes.
Practical implications
The proposed framework can be used in construction sites on a monthly-basis to detect workers who have a high probability to experience fatal accidents, which can be a valuable decision-making input for safety professionals to reduce the number of fatal accidents.
Social implications
Practitioners and occupational health and safety (OHS) departments of construction firms can focus on the most important attributes identified by analysis results to enhance the workers' quality of life and well-being.
Originality/value
The literature on accident outcome predictions is limited in terms of dealing with imbalanced dataset through integrated resampling techniques and ML methods in the construction safety domain. A novel utilization plan was proposed and enhanced by the analysis results.
Keywords
Acknowledgements
The authors would like to thank the Republic of Turkey, Social Security Institution (SSI) for their support and providing the dataset. The authors would like to acknowledge that this paper is submitted in partial fulfillment of the requirements for PhD degree at Yildiz Technical University.
Citation
Koc, K., Ekmekcioğlu, Ö. and Gurgun, A.P. (2023), "Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods", Engineering, Construction and Architectural Management, Vol. 30 No. 9, pp. 4486-4517. https://doi.org/10.1108/ECAM-04-2022-0305
Publisher
:Emerald Publishing Limited
Copyright © 2022, Emerald Publishing Limited