Thumbnail
Access Restriction
Subscribed

Author Chiang, Fei ♦ Sitaramachandran, Siddharth
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2016
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Data quality ♦ Constraint repair ♦ Data repair
Abstract Integrity constraints play an important role in data design. However, in an operational database, they may not be enforced for many reasons. Hence, over time, data may become inconsistent with respect to the constraints. To manage this, several approaches have proposed techniques to repair the data by finding minimal or lowest cost changes to the data that make it consistent with the constraints. Such techniques are appropriate for applications where only the data changes, but schemas and their constraints remain fixed. In many modern applications, however, constraints may evolve over time as application or business rules change, as data are integrated with new data sources or as the underlying semantics of the data evolves. In such settings, when an inconsistency occurs, it is no longer clear if there is an error in the data (and the data should be repaired) or if the constraints have evolved (and the constraints should be repaired). In this work, we present a novel unified cost model that allows data and constraint repairs to be compared on an equal footing. We consider repairs over a database that is inconsistent with respect to a set of rules, modeled as functional dependencies (FDs). FDs are the most common type of constraint and are known to play an important role in maintaining data quality. We propose modifications to the data and to the FDs such that the data and the constraints are better aligned. We evaluate the quality and scalability of our repair algorithms over synthetic and real datasets. The results show that our repair algorithms not only scale well for large datasets but also are able to accurately capture and correct inconsistencies and accurately decide when a data repair versus a constraint repair is best.
Description Author Affiliation: McMaster University, Ontario, Canada (Chiang, Fei; Sitaramachandran, Siddharth)
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2016-08-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 7
Issue Number 3
Page Count 26
Starting Page 1
Ending Page 26


Open content in new tab

   Open content in new tab
Source: ACM Digital Library