Thumbnail
Access Restriction
Subscribed

Author Aletras, Nikolaos ♦ Stevenson, Mark ♦ Clough, Paul
Source ACM Digital Library
Content type Text
Publisher Association for Computing Machinery (ACM)
File Format PDF
Copyright Year ©2013
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Digital libraries ♦ Europeana ♦ Semantic similarity
Abstract Large amounts of cultural heritage content have now been digitized and are available in digital libraries. However, these are often unstructured and difficult to navigate. Automatic techniques for identifying similar items in these collections could be used to improve navigation since it would allow items that are implicitly connected to be linked together and allow sets of similar items to be clustered. Europeana is a large digital library containing more than 20 million digital objects from a set of cultural heritage providers throughout Europe. The diverse nature of this collection means that the items do not have standard metadata to assist navigation. A range of methods for computing the similarity between pairs of texts are applied to metadata records in Europeana in order to estimate the similarity between items. Various methods for computing similarity have been proposed and can be classified into two main approaches: (1) knowledge-based, which make use of external knowledge sources and (2) corpus-based approaches, which rely on analyzing the frequency distributions of words in documents. Both techniques are evaluated against manual judgements obtained for this study and a multiple-choice test created from manually generated categories in cultural heritage collections. We find that a combination of corpus and knowledge-based approaches provide the best results in both experiments.
ISSN 15564673
Age Range 18 to 22 years ♦ above 22 year
Educational Use Research
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2013-01-09
Publisher Place New York
e-ISSN 15564711
Journal Journal on Computing and Cultural Heritage (JOCCH)
Volume Number 5
Issue Number 4
Page Count 19
Starting Page 1
Ending Page 19


Source: ACM Digital Library