Thumbnail
Access Restriction
Subscribed

Author Zohar, Hadas ♦ Liebeskind, Chaya ♦ Schler, Jonathan ♦ Dagan, Ido
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2013
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Hebrew ♦ Language model ♦ Cultural heritage
Abstract This article describes methods for semiautomatic thesaurus construction, for a cross generation, cross genre, and cross cultural corpus. Semiautomatic thesaurus construction is a complex task, and applying it on a cross generation corpus brings its own challenges. We used a Jewish juristic corpus containing documents and genres that were written across 2000 years, and contain a mix of different languages, dialects, geographies, and writing styles. We evaluated different first and second order methods, and introduced a special annotation scheme for this problem, which showed that first order methods performed surprisingly well. We found that in our case, improving the coverage is the more difficult task, for this we introduce a new algorithm to increase recall (coverage)—which is applicable to many other problems as well, and demonstrates significant improvement in our corpus.
ISSN 15564673
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2013-04-11
Publisher Place New York
e-ISSN 15564711
Journal Journal on Computing and Cultural Heritage (JOCCH)
Volume Number 6
Issue Number 1
Page Count 19
Starting Page 1
Ending Page 19


Open content in new tab

   Open content in new tab
Source: ACM Digital Library