Patent · US Active

Multimodal speech recognition method and system, and computer-readable storage medium

US12112744B2 · kind B2 · utility

0Cited by

1References

14Claims

0Family size

Assignee

ZHEJIANG UNIVERSITY · CN

Inventors

Feng Lin · Katy, US
Tiantian Liu · Beijing, CN
Ming Gao · Shandong, CN
Chao-Hsi Wang · Taoyuan, TW
Zhongjie Ba · Hangzhou City, CN
Jinsong Han · Hong Kong, CN
Wenyao Xu · Walnut, US
Kui REN · Hangzhou City, CN

Key dates

Filing date	Mar 2, 2022
Grant date	Oct 8, 2024
Priority date	—
Expiry date	Feb 21, 2043

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY02D30/70
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The disclosure provides a multimodal speech recognition method and system, and a computer-readable storage medium. The method includes calculating a first logarithmic mel-frequency spectral coefficient and a second logarithmic mel-frequency spectral coefficient when a target millimeter-wave signal and a target audio signal both contain speech information corresponding to a target user; inputting the first and the second logarithmic mel-frequency spectral coefficient into a fusion network to determine a target fusion feature, where the fusion network includes at least a calibration module and a mapping module, the calibration module is configured to perform mutual feature calibration on the target audio/millimeter-wave signals, and the mapping module is configured to fuse a calibrated millimeter-wave feature and a calibrated audio feature; and inputting the target fusion feature into a semantic feature network to determine a speech recognition result corresponding to the target user. The disclosure can implement high-accuracy speech recognition.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.