Data repair of density-based data cleaning approach using conditional functional dependencies
Data Technologies and Applications
ISSN: 2514-9288
Article publication date: 19 November 2021
Issue publication date: 22 June 2022
Abstract
Purpose
Data quality is a major challenge in data management. For organizations, the cleanliness of data is a significant problem that affects many business activities. Errors in data occur for different reasons, such as violation of business rules. However, because of the huge amount of data, manual cleaning alone is infeasible. Methods are required to repair and clean the dirty data through automatic detection, which are data quality issues to address. The purpose of this work is to extend the density-based data cleaning approach using conditional functional dependencies to achieve better data repair.
Design/methodology/approach
A set of conditional functional dependencies is introduced as an input to the density-based data cleaning algorithm. The algorithm repairs inconsistent data using this set.
Findings
This new approach was evaluated through experiments on real-world as well as synthetic datasets. The repair quality was determined using the F-measure. The results showed that the quality and scalability of the density-based data cleaning approach improved when conditional functional dependencies were introduced.
Originality/value
Conditional functional dependencies capture semantic errors among data values. This work demonstrates that the density-based data cleaning approach can be improved in terms of repairing inconsistent data by using conditional functional dependencies.
Keywords
Citation
Al-Janabi, S. and Janicki, R. (2022), "Data repair of density-based data cleaning approach using conditional functional dependencies", Data Technologies and Applications, Vol. 56 No. 3, pp. 429-446. https://doi.org/10.1108/DTA-05-2021-0108
Publisher
:Emerald Publishing Limited
Copyright © 2021, Emerald Publishing Limited