Access Restriction

Author Blum, Avrim ♦ Jiang, Tao ♦ Li, Ming ♦ Tromp, John ♦ Yannakakis, Mihalis
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©1994
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Approximation algorithm ♦ Shortest common superstring
Abstract We consider the following problem: given a collection of strings $\textit{s1},…,$ $\textit{sm},$ find the shortest string $\textit{s}$ such that each $\textit{si}$ appears as a substring (a consecutive block) of $\textit{s}.$ Although this problem is known to be NP-hard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of (distinct) strings with maximum overlap until only one string remains. Let $\textit{n}$ denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length $\textit{O(n)}$ (in fact, $2\textit{n}),$ yet the only previous nontrivial bound known for any polynomial-time algorithm is a recent $\textit{O(n}$ log $\textit{n})$ result.We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of $4\textit{n}.$ Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most $3\textit{n}.$ We also show the superstring problem to be MAXSNP-hard, which implies that a polynomial-time approximation scheme for this problem is unlikely.
ISSN 00045411
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 1994-07-01
Publisher Place New York
e-ISSN 1557735X
Journal Journal of the ACM (JACM)
Volume Number 41
Issue Number 4
Page Count 18
Starting Page 630
Ending Page 647

Open content in new tab

   Open content in new tab
Source: ACM Digital Library