Accelerating the Discovery of Data Quality Rules: A Case Study
DOI:
https://doi.org/10.1609/aaai.v25i2.18865Abstract
Poor quality data is a growing and costly problem that af- fects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an application – Data Quality Rules Accelerator (DQRA) – that accelerates Data Quality (DQ) efforts (e.g. data profiling and cleansing) by automatically discovering DQ rules for detecting inconsistencies in data. We then present two evaluations. The first evaluation compares DQRA to existing solutions; and shows that DQRA either outperformed or achieved performance comparable with these solutions on metrics such as precision, recall, and runtime. The second evaluation is a case study where DQRA was piloted at a large utilities company to improve data quality as part of a legacy migration effort. DQRA was able to discover rules that detected data inconsistencies directly impacting revenue and operational efficiency. Moreover, DQRA was able to significantly reduce the amount of effort required to develop these rules compared to the state of the practice. Finally, we describe ongoing efforts to deploy DQRA.