Thumbnail
Access Restriction
Open

Author Chiu, Bill ♦ Keogh, Eamonn ♦ Lonardi, Stefano
Source CiteSeerX
Content type Text
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Care Symbol ♦ Anytime Algorithm ♦ Novel Algorithm ♦ Poor Scalability ♦ Likely Candidate Motif ♦ Core Task ♦ Time Series ♦ Algorithm Fast ♦ Motif Discovery Algorithm ♦ High Probability ♦ Recent Advance ♦ Time Series Motif ♦ Pattern Discovery ♦ Probabilistic Discovery
Description Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 2003-01-01