Thumbnail
Access Restriction
Subscribed

Author De, Sushovan ♦ Hu, Yuheng ♦ Meduri, Venkata Vamsikrishna ♦ Chen, Yi ♦ Kambhampati, Subbarao
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2016
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Data quality ♦ Offline and online cleaning ♦ Statistical data cleaning
Abstract Recent efforts in data cleaning of structured data have focused exclusively on problems like data deduplication, record matching, and data standardization; none of the approaches addressing these problems focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like Conditional Functional Dependencies (which have to be provided by domain experts or learned from a clean sample of the database). In this article, we provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. We thus avoid the necessity for a domain expert or clean master data. We also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. We evaluate our methods over both synthetic and real data.
Description Author Affiliation: New Jersey Institute of Technology, Newark, NJ (Chen, Yi); University of Illinois at Chicago (Hu, Yuheng); Arizona State University, Tempe, AZ (Meduri, Venkata Vamsikrishna; Kambhampati, Subbarao); Arizona State University (De, Sushovan)
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2016-10-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 8
Issue Number 1
Page Count 30
Starting Page 1
Ending Page 30


Open content in new tab

   Open content in new tab
Source: ACM Digital Library