Patent · US Active

Multimodal image classifier using textual and visual embeddings

US11907337B2 · kind B2 · utility

0Cited by

0References

14Claims

0Family size

Assignee

Google LLC · US

Inventors

Ariel Fuxman · San Jose, US
Aleksei Timofeev · Mountain View, US
Zhen Li · Urbana, US
Chun-Ta Lu · Sunnyvale, US
Manan Shah · Miami, US
Chen Sun · San Francisco, US
Krishnamurthy Viswanathan · Village of La Jolla, US
Chao Jia · Nanhu, CN

Key dates

Filing date	Nov 18, 2019
Grant date	Feb 20, 2024
Priority date	—
Expiry date	Nov 18, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG06V10/82
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.