Multimodal speech recognition method and system, and computer-readable storage medium
US12112744B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 2, 2022 |
| Grant date | Oct 8, 2024 |
| Priority date | — |
| Expiry date | Feb 21, 2043 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY02D30/70
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.