Thumbnail
Access Restriction
Subscribed

Author Flajolet, Philippe ♦ Szpankowski, Wojciech ♦ Valle, Brigitte
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2006
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword De Bruijn graph ♦ Pattern matching ♦ Combinatorial calculus ♦ Complex asymptotics ♦ Discrete probability ♦ Distributional analysis ♦ Generating functions ♦ Languages ♦ Subsequences ♦ Words
Abstract We consider the sequence comparison problem, also known as $“\textit{hidden}”$ pattern problem, where one searches for a given $\textit{subsequence}$ in a text (rather than a string understood as a sequence of consecutive symbols). A characteristic parameter is the number of occurrences of a given pattern $\textit{w}$ of length $\textit{m}$ as a subsequence in a random text of length $\textit{n}$ generated by a memoryless source. Spacings between letters of the pattern may either be constrained or not in order to define valid occurrences. We determine the mean and the variance of the number of occurrences, and establish a Gaussian limit law and large deviations. These results are obtained via combinatorics on words, formal language techniques, and methods of analytic combinatorics based on generating functions. The motivations to study this problem come from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2006-01-01
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 53
Issue Number 1
Page Count 37
Starting Page 147
Ending Page 183


Open content in new tab

   Open content in new tab
Source: ACM Digital Library