Thumbnail
Access Restriction
Open

Author Traina, Caetano ♦ Leejay, Traina ♦ Faloutsos, Wu Christos
Source CiteSeerX
Content type Text
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Good Result ♦ Fractal Dimension ♦ Fast Feature Selection ♦ Dimensionality Curse ♦ Easy Interpretation ♦ Good Estimate ♦ Important Attribute ♦ Scalable Algorithm ♦ Intrinsic Dimension ♦ Data Mining ♦ Multimedia Indexing ♦ Many Attribute ♦ Good Approximation ♦ Machine Learning ♦ High Interest ♦ Synthetic Datasets ♦ Dimensionality Reduction ♦ N-dimensional Vector ♦ Nonlinear Correlation ♦ Constant Number ♦ Desirable Property
Description Dimensionality curse and dimensionality reduction are two issues that have retained high interest for data mining, machine learning, multimedia indexing, and clustering. We present a fast, scalable algorithm to quickly select the most important attributes (dimensions) for a given set of n-dimensional vectors. In contrast to older methods, our method has the following desirable properties: (a) it does not do rotation of attributes, thus leading to easy interpretation of the resulting attributes; (b) it can spot attributes that have either linear or nonlinear correlations; (c) it requires a constant number of passes over the dataset; (d) it gives a good estimate on how many attributes should be kept. The idea is to use the ‘fractal ’ dimension of a dataset as a good approximation of its intrinsic dimension, and to drop attributes that do not affect it. We applied our method on real and synthetic datasets, where it gave fast and good results.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 2000-01-01
Publisher Institution In XV Brazilian Symposium on Databases (SBBD