Thumbnail
Access Restriction
Open

Author Taghva, Kazem ♦ Borsack, Julie ♦ Condit, Allen
Source CiteSeerX
Content type Text
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Ocr Version ♦ Ocr Error ♦ Relevant Document Ranking ♦ Considerable Role ♦ Cosine Normalization ♦ Vector Space Model ♦ Full Text Document Collection ♦ Average Precision
Description We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 1996-01-01
Publisher Institution Inf. Proc. and Management