Patent · US Active

Two-level speech prosody transfer

US12327544B2 · kind B2 · utility

0Cited by

1References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Lev Finkelstein · Netanya, IL
Chun-an Chan · Mountain View, US
Byungha Chun · Warrington, GB
Ye Jia · Santa Clara, US
Yu Zhang · Mountain View, US
Robert Andrew James Clark · Stapleford, GB
Vincent Wan · Cambridge, GB

Key dates

Filing date	Nov 11, 2022
Grant date	Jun 10, 2025
Priority date	—
Expiry date	Jan 1, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG10L17/18
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method includes receiving an input text utterance to be synthesized into expressive speech having an intended prosody and a target voice and generating, using a first text-to-speech (TTS) model, an intermediate synthesized speech representation for the input text utterance. The intermediate synthesized speech representation possesses the intended prosody. The method also includes providing the intermediate synthesized speech representation to a second TTS model that includes an encoder portion and a decoder portion. The encoder portion is configured to encode the intermediate synthesized speech representation into an utterance embedding that specifies the intended prosody. The decoder portion is configured to process the input text utterance and the utterance embedding to generate an output audio signal of expressive speech that has the intended prosody specified by the utterance embedding and speaker characteristics of the target voice.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.