Thumbnail
Access Restriction
Subscribed

Author Tremblay, Monica Chiarini ♦ Dutta, Kaushik ♦ Vandermeer, Debra
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2010
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Data quality ♦ Missing data ♦ Pattern discovery
Abstract In today’s data-rich environment, decision makers draw conclusions from data repositories that may contain data quality problems. In this context, missing data is an important and known problem, since it can seriously affect the accuracy of conclusions drawn. Researchers have described several approaches for dealing with missing data, primarily attempting to infer values or estimate the impact of missing data on conclusions. However, few have considered approaches to characterize patterns of bias in missing data, that is, to determine the specific attributes that predict the missingness of data values. Knowledge of the specific systematic bias patterns in the incidence of missing data can help analysts more accurately assess the quality of conclusions drawn from data sets with missing data. This research proposes a methodology to combine a number of Knowledge Discovery and Data Mining techniques, including association rule mining, to discover patterns in related attribute values that help characterize these bias patterns. We demonstrate the efficacy of our proposed approach by applying it on a demo census dataset seeded with biased missing data. The experimental results show that our approach was able to find seeded biases and filter out most seeded noise.
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2010-07-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 2
Issue Number 1
Page Count 19
Starting Page 1
Ending Page 19


Open content in new tab

   Open content in new tab
Source: ACM Digital Library