Thumbnail
Access Restriction
Subscribed

Author Schonhofen, P.
Sponsorship IEEE Comput. Soc. ♦ WIC ♦ ACM
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2006
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Special computer methods
Subject Keyword Content based retrieval ♦ Automation ♦ Taxonomy ♦ Clustering algorithms ♦ Encyclopedias ♦ Ontologies ♦ Wikipedia ♦ Information retrieval ♦ Computer networks ♦ Testing
Abstract In the size and coverage of Wikipedia, a freely available online encyclopedia has reached the point where it can be utilized similar to an ontology or taxonomy to identify the topics discussed in a document. In this paper we show that even a simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories surprisingly well. We test the reliability of our method by predicting categories of Wikipedia articles themselves based on their bodies, and by performing classification and clustering on 20 newsgroups and RCV1, representing documents by their Wikipedia categories instead of their texts
Description Author affiliation: Comput. & Autom. Res. Inst., Hungarian Acad. of Sci., Budapest (Schonhofen, P.)
ISBN 0769527477
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research ♦ Reading
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2006-12-18
Publisher Place China
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Size (in Bytes) 328.28 kB
Page Count 7
Starting Page 456
Ending Page 462


Source: IEEE Xplore Digital Library