Thumbnail
Access Restriction
Open

Author Inkpen, Diana ♦ Désilets, Alain
Source CiteSeerX
Content type Text
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Automatic Speech Recognition ♦ Semantic Similarity ♦ Several Variant ♦ Recognition Error ♦ Many Recognition Error ♦ Good Content Word ♦ Content Word ♦ Wide Range ♦ Large Volume ♦ Automatic Speech Transcript ♦ Semantic Outlier ♦ Evaluation Measure ♦ Spoken Audio ♦ Spoken Audio Document ♦ End User ♦ Challenging Task
Description Browsing through large volumes of spoken audio is known to be a challenging task for end users. One way to alleviate this problem is to allow users to gist a spoken audio document by glancing over a transcript generated through Automatic Speech Recognition. Unfortunately, such transcripts typically contain many recognition errors which are highly distracting and make gisting more difficult. In this paper we present an approach that detects recognition errors by identifying words which are semantic outliers with respect to other words in the transcript. We describe several variants of this approach. We investigate a wide range of evaluation measures and we show that we can significantly reduce the number of errors in content words, with the trade-off of losing some good content words. 1
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 2005-01-01
Publisher Institution In: Proceedings of EMNLP. Association for Computational Linguistics