Access Restriction

Author Jinze Liu ♦ Paulsen, S. ♦ Wei Wang ♦ Nobel, A. ♦ Prins, J.
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2005
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Subject Keyword Data mining ♦ Itemsets ♦ Data analysis ♦ Computer science ♦ Statistical analysis ♦ Operations research ♦ Application software ♦ Context modeling ♦ Relational databases ♦ Association rules
Abstract Frequent itemset mining is a popular and important first step in analyzing data sets across a broad range of applications. The traditional, "exact" approach for finding frequent itemsets requires that every item in the itemset occurs in each supporting transaction. However, real data is typically subject to noise, and in the presence of such noise, traditional itemset mining may fail to detect relevant itemsets, particularly those large itemsets that are more vulnerable to noise. In this paper we propose approximate frequent itemsets (AFI), as a noise-tolerant itemset model. In addition to the usual requirement for sufficiently many supporting transactions, the AFI model places constraints on the fraction of errors permitted in each item column and the fraction of errors permitted in a supporting transaction. Taken together, these constraints winnow out the approximate itemsets that exhibit systematic errors. In the context of a simple noise model, we demonstrate that AFI is better at recovering underlying data patterns, while identifying fewer spurious patterns than either the exact frequent itemset approach or the existing error tolerant itemset approach of Yang et al.
Description Author affiliation: Dept. of Comput. Sci., North Carolina Univ., Chapel Hill, NC, USA (Jinze Liu; Paulsen, S.; Wei Wang; Nobel, A.)
ISBN 0769522785
ISSN 15504786
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research ♦ Reading
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2005-11-27
Publisher Place USA
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Size (in Bytes) 304.98 kB

Source: IEEE Xplore Digital Library