Patent · US Active

Instance level scene recognition with a vision language model

US12387510B2 · kind B2 · utility

0Cited by

5References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Harshit Kharbanda · Pleasanton, US
Boris Bluntschli · Zürich, CH
Vibhuti Mahajan · Los Angeles, US
Louis Wang · San Francisco, US

Key dates

Filing date	Mar 28, 2024
Grant date	Aug 12, 2025
Priority date	—
Expiry date	Mar 28, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06V10/82
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Systems and methods for image understanding can include one or more object recognition systems and one or more vision language models to generate an augmented language output that can be both scene-aware and object-aware. The systems and methods can process an input image with an object recognition model to generate an object recognition output descriptive of identification details for an object depicted in the input image. The systems and methods can include processing the input image with a vision language model to generate a language output descriptive of a predicted scene description. The object recognition output can then be utilized to augment the language output to generate an augmented language output that includes the scene understanding of the language output with the specificity of the object recognition output.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.