Thumbnail
Access Restriction
Subscribed

Author Breimer, Eric A. ♦ Goldberg, Mark K. ♦ Lim, Darren T.
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2003
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data
Abstract We present an experimental study of a learning algorithm for the longest common subsequence problem, $\textit{LCS}.$ Given an arbitrary input domain, the algorithm learns an $\textit{LCS}-procedure$ tailored to that domain. The learning is done with the help of an oracle, which can be any $\textit{LCS}-algorithm.$ After solving a limited number of training inputs using an oracle, the learning algorithm outputs a new $\textit{LCS}-procedure.Our$ experiments demonstrate that, by allowing a slight loss of optimality, learning yields a procedure which is significantly faster than the oracle. The oracle used for the experiments is the $\textit{np}-procedure$ by Wu et al., a modification of Myers' classical $\textit{LCS}-algorithm.$ We show how to scale up the results of learning on small inputs to inputs of arbitrary lengths. For the domain of two random 2-symbol inputs of length $\textit{n},$ learning yields a program with 0.999 expected accuracy, which runs in $O(n^{1.41})-time,$ in contrast with $O(n^{2}/log$ $\textit{n})$ running time of the fastest theoretical algorithm that produces optimal solutions. For the domain of random 2-symbol inputs of length 100,000, the program runs 10.5 times faster than the $\textit{np}-procedure,$ producing 0.999- accurate outputs. The scaled version of the evolved algorithm applied to random inputs of length 1 million runs approximately 30 times faster than the $\textit{np}-procedure$ while constructing 0.999- accurate solutions. We apply the evolved algorithm to DNA sequences of various lengths by training on random 4-symbol sequences of up to length 10,000. The evolved algorithm, scaled up to the lengths of up to 1.8 million, produces solutions with the 0.998-accuracy in a fraction of the time used by the $\textit{np}.$
ISSN 10846654
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2003-12-01
Publisher Place New York
e-ISSN 10846654
Journal Journal of Experimental Algorithmics (JEA)
Volume Number 8


Open content in new tab

   Open content in new tab
Source: ACM Digital Library