Patent · US Active

Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

US7716052B2 · kind B2 · utility

6Cited by

12References

17Claims

0Family size

Assignee

Nuance Communications, Inc. · US

Inventors

Andrew Aaron · Ardsley, US
Ellen M. Eide · Mount Kisco, US
Wael Hamza · Yorktown Heights, US
Michael A. Picheny · White Plains, US
Charles T. Rutherfoord · Delray Beach, US
Zhi Wei Shuang · Beijing, CN
Maria E. Smith · Pembroke Pines, US

Key dates

Filing date	Apr 7, 2005
Grant date	May 11, 2010
Priority date	—
Expiry date	May 16, 2027

Classification

Technology area (CPC G)Physics
CPC primaryG10L2021/0135
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.