Thumbnail
Access Restriction
Open

Author Levit, M. ♦ Gorin, A. L. ♦ Wright, J. H.
Source CiteSeerX
Content type Text
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Salient Acoustic Morpheme ♦ Large Speech Corpus ♦ Salient Phone Sequence ♦ Hmihy Task ♦ Salient Phone-sequences ♦ Improved Call-classification Result ♦ Semantic Association ♦ Phone-based Sequence ♦ Previous Strategy ♦ Telecommunication Service ♦ Current Methodology ♦ Spoken Language ♦ Task-independent Asr-system ♦ Multipass Algorithm ♦ Untranscribed Speech Corpus ♦ New Multipass Algorithm ♦ Statistical Language Model
Description Proc. Eurospeech
We are interested in spoken language understanding within the domain of automated telecommunication services. Our current methodology involves training statistical language models from large annotated corpora for recognition and understanding. Since the transcribing of large speech corpora is a resource consuming task, we are motivated to exploit speech without transcriptions. In particular, we learn the semantic associations for a task exploiting only phone-based sequences from the output of a task-independent ASR-system. In this paper we present a new multipass algorithm for acquiring salient phone sequences from untranscribed speech corpora and evaluate their utility for the HMIHY task. Compared to our previous strategy, this algorithm is shown to produce improved call-classification results while reducing up to 7-fold the number of salient phone-sequences selected for training. 1.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 2001-01-01