Method and apparatus for combined learning using feature enhancement based on deep neural network and modified loss function for speaker recognition robust to noisy environments
US11854554B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 30, 2020 |
| Grant date | Dec 26, 2023 |
| Priority date | — |
| Expiry date | Sep 14, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2021/02166
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Presented are a combined learning method and device using a transformed loss function and feature enhancement based on a deep neural network for speaker recognition that is robust to a noisy environment. The combined learning method using the transformed loss function and the feature enhancement based on the deep neural network for speaker recognition that is robust to the noisy environment, according to an embodiment, may comprise: a preprocessing step for learning to receive, as an input, a speech signal and remove a noise or reverberation component by using at least one of a beamforming algorithm and a dereverberation algorithm using the deep neural network; a speaker embedding step for learning to classify an utterer from the speech signal, from which a noise or reverberation component has been removed, by using a speaker embedding model based on the deep neural network; and a step for, after connecting a deep neural network model included in at least one of the beamforming algorithm and the dereverberation algorithm and the speaker embedding model, for speaker embedding, based on the deep neural network, performing combined learning by using a loss function.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.