Synthesis of speech from text in a voice of a target speaker using neural networks
US12175963B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 30, 2023 |
| Grant date | Dec 24, 2024 |
| Priority date | — |
| Expiry date | Nov 30, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2013/021
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.