Patent · US Active

Multimodal speech recognition method and system, and computer-readable storage medium

US12112744B2 · kind B2 · utility

0Cited by
1References
14Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 2, 2022
Grant dateOct 8, 2024
Priority date
Expiry dateFeb 21, 2043

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY02D30/70
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.