### Complete inverted files for efficient text retrieval and analysisComplete inverted files for efficient text retrieval and analysis

Access Restriction
Subscribed

 Author Blumer, A. ♦ Blumer, J. ♦ Haussler, D. ♦ McConnell, R. ♦ Ehrenfeucht, A. Source ACM Digital Library Content type Text Publisher Association for Computing Machinery (ACM) File Format PDF Copyright Year ©1987 Language English
 Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science Abstract Given a finite set of texts $\textit{S}$ = ${\textit{w}1,$ … , $\textit{wk}}$ over some fixed finite alphabet &Sgr;, a complete inverted file for $\textit{S}$ is an abstract data type that provides the functions $\textit{find}(\textit{w}),$ which returns the longest prefix of $\textit{w}$ that occurs (as a subword of a word) in $\textit{S};$ $\textit{freq}(\textit{w}),$ which returns the number of times $\textit{w}$ occurs in $\textit{S};$ and $\textit{locations}(\textit{w}),$ which returns the set of positions where $\textit{w}$ occurs in $\textit{S}.$ A data structure that implements a complete inverted file for $\textit{S}$ that occupies linear space and can be built in linear time, using the uniform-cost RAM model, is given. Using this data structure, the time for each of the above query functions is optimal. To accomplish this, techniques from the theory of finite automata and the work on suffix trees are used to build a deterministic finite automaton that recognizes the set of all subwords of the set $\textit{S}.$ This automaton is then annotated with additional information and compacted to facilitate the desired query functions. The result is a data structure that is smaller and more flexible than the suffix tree. ISSN 00045411 Age Range 18 to 22 years ♦ above 22 year Educational Use Research Education Level UG and PG Learning Resource Type Article Publisher Date 1987-07-01 Publisher Place New York e-ISSN 1557735X Journal Journal of the ACM (JACM) Volume Number 34 Issue Number 3 Page Count 18 Starting Page 578 Ending Page 595

#### Open content in new tab

Source: ACM Digital Library