Geometric suffix tree: Indexing protein 3-D structuresGeometric suffix tree: Indexing protein 3-D structures

Access Restriction
Subscribed

 Author Shibuya, Tetsuo Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©2010 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science Subject Keyword Suffix tree ♦ Protein 3-D structure ♦ Root mean square deviation Abstract Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate index data structures for such 3-D structures are highly desired for research on proteins. This article proposes a new data structure for indexing protein 3-D structures. For strings, there are many efficient indexing structures such as suffix trees, but it has been considered very difficult to design such sophisticated data structures against 3-D structures like proteins. Our index structure is based on the suffix tree and is called the geometric suffix tree. By using the geometric suffix tree for a set of protein structures, we can exactly search for all of their substructures whose RMSDs (root mean square deviations) or URMSDs (unit-vector root mean square deviations) to a given query 3-D structure are not larger than a given bound. Though there are $O(N^{2})$ substructures in a structure of size $\textit{N},$ our data structure requires only $\textit{O}(\textit{N})$ space for indexing all the substructures. We propose an $O(N^{2})$ construction algorithm for it, while a naive algorithm would require $O(N^{3})$ time to construct it. Moreover we propose an efficient search algorithm. Experiments show that we can search for similar structures much faster than previous algorithms if the RMSD threshold is not larger than 1Å. The experiments also show that the construction time of the geometric suffix tree is practically almost linear to the size of the database, when applied to a protein structure database. ISSN 00045411 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 2010-03-01 Publisher Place New York e-ISSN 1557735X Journal Journal of the ACM (JACM) Volume Number 57 Issue Number 3 Page Count 17 Starting Page 1 Ending Page 17

Open content in new tab

Source: ACM Digital Library