Method and apparatus for improved duration modeling of phonemes
US6785652B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 19, 2002 |
| Grant date | Aug 31, 2004 |
| Priority date | — |
| Expiry date | Dec 19, 2022 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L13/08
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model. An inverse of the non-exponential functional transformation is applied to duration observations, or training data. Coefficients are generated for use with the generalized additive model. The generalized additive model comprising the coefficients i…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.