Patent · US Active

Vision-assisted speech processing

US11257493B2 · kind B2 · utility

18Cited by

7References

22Claims

0Family size

Assignee

SoundHound, Inc. · US

Inventors

Cristina Vasconcelos · Montréal, CA
Zili Li · Barrington, US

Key dates

Filing date	Jul 11, 2019
Grant date	Feb 22, 2022
Priority date	—
Expiry date	May 31, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG10L2015/223
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Systems and methods for processing speech are described. In certain examples, image data is used to generate visual feature tensors and audio data is used to generate audio feature tensors. The visual feature tensors and the audio feature tensors are used by a linguistic model to determine linguistic features that are usable to parse an utterance of a user. The generation of the feature tensors may be jointly configured with the linguistic model. Systems may be provided in a client-server architecture.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.