Thumbnail
Access Restriction
Subscribed

Author Iwen, M.A. ♦ Lang, W. ♦ Patel, J.M.
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2008
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Gene expression ♦ Cancer ♦ Association rules ♦ Data mining ♦ Classification tree analysis ♦ Training data ♦ Costs ♦ Support vector machines ♦ Support vector machine classification ♦ Mathematics
Abstract Current state-of-the-art association rule-based classifiers for gene expression data operate in two phases: (i) Association rule mining from training data followed by (ii) Classification of query data using the mined rules. In the worst case, these methods require an exponential search over the subset space of the training data set's samples and/or genes during at least one of these two phases. Hence, existing association rule-based techniques are prohibitively computationally expensive on large gene expression datasets. Our main result is the development of a heuristic rule-based gene expression data classifier called Boolean Structure Table Classification (BSTC). BSTC is explicitly related to association rule-based methods, but is guaranteed to be polynomial space/time. Extensive cross validation studies on several real gene expression datasets demonstrate that BSTC retains the classification accuracy of current association rule-based methods while being orders of magnitude faster than the leading classifier RCBT on large datasets. As a result, BSTC is able to finish table generation and classification on large datasets for which current association rule-based methods become computationally infeasible. BSTC also enjoys two other advantages over association rule-based classifiers: (i) BSTC is easy to use (requires no parameter tuning), and (ii) BSTC can easily handle datasets with any number of class types. Furthermore, in the process of developing BSTC we introduce a novel class of Boolean association rules which have potential applications to other data mining problems.
Description Author affiliation: Dept. of Math., Univ. of Michigan, Ann Arbor, MI (Iwen, M.A.)
ISBN 9781424418367
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research ♦ Reading
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2008-04-07
Publisher Place Mexico
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Size (in Bytes) 1.01 MB
Page Count 10
Starting Page 1062
Ending Page 1071


Source: IEEE Xplore Digital Library