Dual use of audio noise level in speech-to-text framework
US11335350B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 12, 2021 |
| Grant date | May 17, 2022 |
| Priority date | — |
| Expiry date | Oct 12, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2025/783
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An apparatus includes processor(s) to: perform pre-processing operations including derive an audio noise level of speech audio of a speech data set, derive a first relative weighting for first and second segmentation techniques for identifying likely sentence pauses in the speech audio based on the audio noise level, and select likely sentence pauses for a converged set of likely sentence pauses from likely sentence pauses identified by the first and/or second segmentation techniques based on the first relative weighting; and perform speech-to-text processing operations including divide the speech data set into data segments representing speech segments of the speech audio based on the converged set of likely sentence pauses, and derive a second relative weighting based on the audio noise level for selecting words indicated by an acoustic model or by a language model as being most likely spoken in the speech audio for inclusion in a transcript.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.