Systems and methods for human listening and live captioning
US11922963B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 26, 2021 |
| Grant date | Mar 5, 2024 |
| Priority date | — |
| Expiry date | May 26, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/51
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances. Speech enhancement model parameters are updated to optimize the speech enhancement model to generate optimized noise-suppressed speech outputs based on a comparison of the noise-suppressed transcription output and ground truth transcription labels.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.