Thumbnail
Access Restriction
Open

Author Wu, Zhizheng ♦ Virtanen, Tuomas ♦ Kinnunen, Tomi ♦ Chng, Eng Siong ♦ Li, Haizhou
Source CiteSeerX
Content type Text
File Format PDF
Language English
Subject Domain (in DDC) Computer science, information & general works ♦ Data processing & computer science
Description Although temporal information of speech has been shown to play an important role in perception, most of the voice conver-sion approaches assume the speech frames are independent of each other, thereby ignoring the temporal information. In this study, we improve conventional unit selection approach by us-ing exemplars which span multiple frames as base units, and also take temporal information constraint into voice conver-sion by using overlapping frames to generate speech parame-ters. This approach thus provides more stable concatenation cost and avoids discontinuity problem in conventional unit se-lection approach. The proposed method also keeps away from the over-smoothing problem in the mainstream joint density Gaussian mixture model (JD-GMM) based conversion method by directly using target speaker’s training data for synthesizing the converted speech. Both objective and subjective evaluations indicate that our proposed method outperforms JD-GMM and conventional unit selection methods. Index Terms: Voice conversion, unit selection, multi-frame ex-emplar, temporal information
in Proc. Interspeech
Educational Role Student ♦ Teacher
Age Range above 22 year
Educational Use Research
Education Level UG and PG ♦ Career/Technical Study
Learning Resource Type Article
Publisher Date 2013-01-01