Thumbnail
Access Restriction
Subscribed

Author Barton, Ian J. ♦ Snell, Michael J. ♦ Creasey, Susan E. ♦ Lynch, Michael F.
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Language English
Subject Keyword Direct access ♦ Character string ♦ Information retrieval ♦ File organization ♦ Bit vector ♦ Text searching ♦ Information theory
Abstract Using direct access computer files of bibliographic information, an attempt is made to overcome one of the problems often associated with information retrieval, namely, the maintenance and use of large dictionaries, the greater part of which is used only infrequently. A novel method is presented, which maps the hyperbolic frequency distribution of text characteristics onto a rectangular distribution. This is more suited to implementation on storage devices.This method treats text as a string of characters rather than words bounded by spaces, and chooses subsets of strings such that their frequencies of occurrence are more even than those of word types. The members of this subset are then used as index keys for retrieval. The rectangular distribution of key frequencies results in a much simplified file organization and promises considerable cost advantages.
Description Affiliation: Univ. of Sheffield, Sheffield, U.K. (Barton, Ian J.; Creasey, Susan E.; Lynch, Michael F.; Snell, Michael J.)
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2005-08-01
Publisher Place New York
Journal Communications of the ACM (CACM)
Volume Number 17
Issue Number 6
Page Count 6
Starting Page 345
Ending Page 350


Open content in new tab

   Open content in new tab
Source: ACM Digital Library