Thumbnail
Access Restriction
Open

Author Thomas, Mark R. P. ♦ Gudnason, Jon ♦ Naylor, Patrick A.
Source CiteSeerX
Content type Text
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Subject Keyword Vus Detection ♦ Glottal Closure Instant ♦ Segmented Time-scale Modification ♦ Time Scale Compression ♦ Many Application ♦ Recorded Voicemail Message ♦ Speech Periodicity ♦ Average Mean Opinion Score ♦ Unvoiced Silence ♦ Time Scale Modification ♦ High Audio Quality ♦ Itu-t P800 ♦ Motion Video ♦ Gaussian Mixture ♦ Reliable Time Scale Modification ♦ Voiced Speech ♦ Unvoiced Speech ♦ Improved Intelligibility ♦ Dypsa Algorithm ♦ Modification Factor ♦ Speech Time Scale Modification ♦ Fast Scanning ♦ Lip Synchronization ♦ Periodicity Exists ♦ Perceptual Quality
Description in Proc European Signal Processing Conf
This paper presents a method for speech time scale modification. Voiced speech is pseudo-periodic, allowing time scale modification by the repetition or removal of cycles as necessary. However, in the case of unvoiced speech and at the boundaries of voiced speech, no such periodicity exists so the speech should not be modified. To address this issue, the proposed approach is novel in its use of the DYPSA algorithm to derive speech periodicity from glottal closure instants (GCIs), followed by a Gaussian Mixture model-based voiced/unvoiced/silence (VUS) classifier. A listening test based on ITU-T P800 has been conducted and has shown that, by employing VUS detection, the average mean opinion score of the perceptual quality of processed speech exceeds that of a method without VUS detection by 0.61 over a range of modification factors. Results are presented as a function of modification factor for normal and fast original talking rate. Reliable time scale modification of high audio quality enables many applications, such as time scale compression for fast scanning of recorded voicemail messages, slowing talking rate for improved intelligibility in forensics and lip synchronization in motion video. 1.
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 2008-01-01