### A learning algorithm for the longest common subsequence problemA learning algorithm for the longest common subsequence problem

Access Restriction
Subscribed

 Author Breimer, Eric A. ♦ Goldberg, Mark K. ♦ Lim, Darren T. Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©2003 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Computer programming, programs & data Abstract We present an experimental study of a learning algorithm for the longest common subsequence problem, $\textit{LCS}.$ Given an arbitrary input domain, the algorithm learns an $\textit{LCS}-procedure$ tailored to that domain. The learning is done with the help of an oracle, which can be any $\textit{LCS}-algorithm.$ After solving a limited number of training inputs using an oracle, the learning algorithm outputs a new $\textit{LCS}-procedure.Our$ experiments demonstrate that, by allowing a slight loss of optimality, learning yields a procedure which is significantly faster than the oracle. The oracle used for the experiments is the $\textit{np}-procedure$ by Wu et al., a modification of Myers' classical $\textit{LCS}-algorithm.$ We show how to scale up the results of learning on small inputs to inputs of arbitrary lengths. For the domain of two random 2-symbol inputs of length $\textit{n},$ learning yields a program with 0.999 expected accuracy, which runs in $O(n^{1.41})-time,$ in contrast with $O(n^{2}/log$ $\textit{n})$ running time of the fastest theoretical algorithm that produces optimal solutions. For the domain of random 2-symbol inputs of length 100,000, the program runs 10.5 times faster than the $\textit{np}-procedure,$ producing 0.999- accurate outputs. The scaled version of the evolved algorithm applied to random inputs of length 1 million runs approximately 30 times faster than the $\textit{np}-procedure$ while constructing 0.999- accurate solutions. We apply the evolved algorithm to DNA sequences of various lengths by training on random 4-symbol sequences of up to length 10,000. The evolved algorithm, scaled up to the lengths of up to 1.8 million, produces solutions with the 0.998-accuracy in a fraction of the time used by the $\textit{np}.$ ISSN 10846654 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 2003-12-01 Publisher Place New York e-ISSN 10846654 Journal Journal of Experimental Algorithmics (JEA) Volume Number 8

#### Open content in new tab

Source: ACM Digital Library