Patent · US Active

Customizing text-to-speech language models using adapters for conversational AI systems and applications

US12406653B2 · kind B2 · utility

0Cited by

2References

19Claims

0Family size

Assignee

NVIDIA Corporation · US

Inventors

Cheng-Ping Hsieh · Village of La Jolla, US
Subhankar Ghosh · Bengaluru, IN
Boris Ginsburg · Santa Clara, US

Key dates

Filing date	Oct 13, 2022
Grant date	Sep 2, 2025
Priority date	—
Expiry date	Jun 12, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG10L13/0335
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

In various examples, one or more text-to-speech machine learning models may be customized or adapted to accommodate new or additional speakers or speaker voices without requiring a full re-training of the models. For example, a base model may be trained on a set of one or more speakers and, after training or deployment, the model may be adapted to support one or more other speakers. To do this, one or more additional layers (e.g., adapter layers) may be added to the model, and the model may be re-trained or updated—e.g., by freezing parameters of the base model while updating parameters of the adapter layers—to generate an adapted model that can support the one or more original speakers of the base model in addition to the one or more additional speakers corresponding to the adapter layers.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.