Thumbnail
Access Restriction
Subscribed

Author Dash, Manoranjan ♦ Singhania, Ayush
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2009
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Information filtering ♦ Association rule mining ♦ Classification ♦ Clustering ♦ Data mining ♦ Sampling ♦ Selection process
Abstract In this article we address the issue of how to mine efficiently in large and noisy data. We propose an efficient sampling algorithm $(\textit{Concise})$ as a solution for large and noisy data. Concise is far more superior than the Simple Random Sampling $(\textit{SRS})$ in selecting a representative sample. Particularly when the data is very large and noisy, Concise achieves the maximum gain over SRS. The comparison is in terms of their impact on subsequent data mining tasks, specifically, classification, clustering, and association rule mining. We compared Concise with a few existing noise removal algorithms followed by SRS. Although the accuracy of mining results are similar, Concise spends very little time compared to the existing algorithms because Concise has linear time complexity.
ISSN 19361955
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2009-09-01
Publisher Place New York
e-ISSN 19361963
Journal Journal of Data and Information Quality (JDIQ)
Volume Number 1
Issue Number 2
Page Count 30
Starting Page 1
Ending Page 30


Source: ACM Digital Library