Thumbnail
Access Restriction
Open

Author Pera, Maria Soledad ♦ Ng, Yiu-Kai
Source CiteSeerX
Content type Text
File Format PDF
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Entire Document ♦ Word Similarity ♦ Information Search Process ♦ Web Document ♦ Web Document Summary ♦ Extractive Single-document Summarization Approach ♦ Predefined Class ♦ Considerable Amount ♦ Text Classification ♦ Classified Web Document ♦ Newsgroups Datasets ♦ Summarization Task ♦ Word-correlation Factor ♦ Significance Factor ♦ Extractive Summarization Method ♦ Multinomial Na ♦ Classification Process ♦ High-quality Summary ♦ Enhanced Approach Corsum-sf ♦ Information Need ♦ Large Collection ♦ Document Summarization ♦ Enhance Corsum ♦ Bayes Classifier ♦ Classification Time ♦ Experimental Result
Abstract Text classification categorizes web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified web documents to identify the ones with contents that satisfy their information needs. In solving this problem, we first introduce CorSum, an extractive single-document summarization approach, which is simple and effective in performing the summarization task, since it only relies on word similarity to generate high-quality summaries. We further enhance CorSum by considering the significance factor of sentences in documents, in addition to using word-correlation factors, for document summarization. We denote the enhanced approach CorSum-SF and use the summaries generated by CorSum-SF to train a Multinomial Naïve Bayes classifier for categorizing web document summaries into predefined classes. Experimental results on the DUC-2002 and 20 Newsgroups datasets show that CorSum-SF outperforms other extractive summarization methods, and classification time (accuracy, respectively) is significantly reduced (compatible, respectively) using CorSum-SF generated summaries compared with using the entire documents. More importantly, browsing summaries, instead of entire documents, which are assigned to predefined categories, facilitates the information search process on the Web.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study