Access Restriction

Author Zhang, J.J. ♦ Chan, R.H.Y. ♦ Fung, P.
Sponsorship IEEE Signal Processing Society
Source IEEE Xplore Digital Library
Content type Text
Publisher Institute of Electrical and Electronics Engineers, Inc. (IEEE)
File Format PDF
Copyright Year ©2006
Language English
Subject Domain (in DDC) Natural sciences & mathematics ♦ Physics ♦ Electricity & electronics
Subject Keyword Automatic speech recognition ♦ Data mining ♦ Feature extraction ♦ Acoustic testing ♦ Hidden Markov models ♦ Decoding ♦ Loudspeakers ♦ Text recognition ♦ Humans ♦ Search engines ♦ rhetorical information ♦ Extractive speech summarization ♦ lecture speech
Abstract We propose an extractive summarization approach with a novel shallow rhetorical structure learning framework for speech summarization. One of the most under-utilized features in extractive summarization is hierarchical structure information-semantically cohesive units that are hidden in spoken documents. We first present empirical evidence that rhetorical structure is the underlying semantic information, which is rendered in linguistic and acoustic/prosodic forms in lecture speech. A segmental summarization method, where the document is partitioned into rhetorical units by K-means clustering, is first proposed to test this hypothesis. We show that this system produces summaries at 67.36% ROUGE-L F-measure, a 4.29% absolute increase in performance compared with that of the baseline system. We then propose Rhetorical-State Hidden Markov Models (RSHMMs) to automatically decode the underlying hierarchical rhetorical structure in speech. Tenfold cross validation experiments are carried out on conference speeches. We show that system based on RSHMMs gives a 71.31% ROUGE-L F-measure, a 8.24% absolute increase in lecture speech summarization performance compared with the baseline system without using RSHMM. Our method equally outperforms the baseline with a conventional discourse feature. We also present a thorough investigation of the relative contribution of different features and show that, for lecture speech, speaker-normalized acoustic features give the most contribution at 68.5% ROUGE-L F-measure, compared to 62.9% ROUGE-L F-measure for linguistic features, and 59.2% ROUGE-L F-measure for un-normalized acoustic features. This shows that the individual speaking style of each speaker is highly relevant to the summarization.
Description Author affiliation :: Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Kowloon, China
ISSN 15587916
Education Level UG and PG
Learning Resource Type Article
Publisher Date 2010-08-01
Publisher Place U.S.A.
Rights Holder Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Volume Number 18
Issue Number 6
Size (in Bytes) 1.65 MB
Page Count 11
Starting Page 1147
Ending Page 1157

Source: IEEE Xplore Digital Library