Thumbnail
Access Restriction
Subscribed

Author Fisher, Craig W ♦ Lauria, Eitel J M ♦ Matheus, Carolyn C
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2010
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Data and information quality ♦ Complexity ♦ Randomness
Abstract Practitioners and researchers regularly refer to error rates or accuracy percentages of databases. The former is the number of cells in error divided by the total number of cells; the latter is the number of correct cells divided by the total number of cells. However, databases may have similar error rates (or accuracy percentages) but differ drastically in the complexity of their accuracy problems. A simple percent does not provide information as to whether the errors are systematic or randomly distributed throughout the database. We expand the accuracy metric to include a randomness measure and include a probability distribution value. The proposed randomness check is based on the Lempel-Ziv (LZ) complexity measure. Through two simulation studies we show that the LZ complexity measure can clearly differentiate as to whether the errors are random or systematic. This determination is a significant first step and is a major departure from the percentage-alone technique. Once it is determined that the errors are random, a probability distribution, Poisson, is used to help address various managerial questions.
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2009-12-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 1
Issue Number 3
Page Count 21
Starting Page 1
Ending Page 21


Open content in new tab

   Open content in new tab
Source: ACM Digital Library