Author Gantner, Zeno ♦ Schmidt-Thieme, Lars
Source CiteSeerX
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Abstract Wikipedia’s article contents and its category hierarchy are widely used to produce semantic resources which improve performance on tasks like text classification and keyword extraction. The reverse – using text classification methods for predicting the categories of Wikipedia articles – has attracted less attention so far. We propose to “return the favor ” and use text classifiers to improve Wikipedia. This could support the emergence of a virtuous circle between the wisdom of the crowds and machine learning/NLP methods. We define the categorization of Wikipedia articles as a multi-label classification task, describe two solutions to the task, and perform experiments that show that our approach is feasible despite the high number of labels. 1
