Techniques for disentangled variational speech representation learning for zero-shot voice conversion
US12354594B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 19, 2022 |
| Grant date | Jul 8, 2025 |
| Priority date | — |
| Expiry date | May 13, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L17/18
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for disentangled variational speech representation learning for voice conversion, performed by at least one processor, is provided. The method includes receiving input speech segments, encoding the input speech segments via a shared encoder to generate a speaker embedding and a content embedding, encoding the posterior distributions of the speaker embedding via a speaker encoder and encoding the posterior distributions of the content embedding via a content encoder to obtain encoded results, and decoding the encoded results by concatenating the encoded results to obtain a reconstructed speech output.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.