Patent · US Active

Rescoring automatic speech recognition hypotheses using audio-visual matching

US12334054B2 · kind B2 · utility

0Cited by

6References

26Claims

0Family size

Assignee

Google LLC · US

Inventors

Olivier Siohan · New York, US
Takaki Makino · Mountain View, US
Richard Cameron Rose · Watchung, US
Otavio Braga · Mountain View, US
Hank Liao · Mountain View, US
Basilio Garcia Castillo · Mountain View, US

Key dates

Filing date	Nov 18, 2019
Grant date	Jun 17, 2025
Priority date	—
Expiry date	Aug 20, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/16
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.