Thumbnail
Access Restriction
Open

Author Hagen, Telephony Speech ♦ Soltau, Hagen ♦ Yu, Hua ♦ Metze, Florian ♦ Fügen, Christian ♦ Jin, Qin ♦ Jou, Szu-Chen
Source CiteSeerX
Content type Text
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword System Development ♦ Acoustic Modelling ♦ Detailed Result ♦ Phone Dependent Semi-tied Full Covariance ♦ Semi-tied Clustering ♦ Error Rate ♦ Context Dependent Interpolation ♦ Language Model ♦ Rt-03 Ct Evaluation Set ♦ Multipass Transcription Scheme ♦ Relative Improvement ♦ Development Set ♦ Robust Estimation ♦ Isl Rich Transcription System ♦ Language Modelling ♦ Final System ♦ Feature Adaptive Training
Description This paper describes the ISL large vocabulary conversational telephony speech recognition system, which was tested in NIST's RT-03S ("Switchboard") evaluation. We present our experiments on improving preprocessing, acoustic modelling, and language modelling. The system features phone dependent semi-tied full covariances, semi-tied clustering of septa-phones, clustering across phones, feature adaptive training, robust estimation of VTLN and MLLR, as well as context dependent interpolation of language models. We present detailed results for each stage of our multipass transcription scheme. System development started in 2002 with an error rate of 35.1% on our internal 1h development set. The final system performed at WER 21.8%, a 38% relative improvement. The error rate on the RT-03 CTS evaluation set is 23.4%.
In ICASSP
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 2004-01-01