Patent · US Active

Joint representation learning from images and text

US11948078B2 · kind B2 · utility

1Cited by

0References

20Claims

0Family size

Assignee

NVIDIA Corporation · US

Inventors

Arash Vahdat · Mountain View, US
Tanmay Gupta · Hillsboro, US
Xiaodong Yang · New York, US
Jan Kautz · Lexington, US

Key dates

Filing date	Aug 21, 2020
Grant date	Apr 2, 2024
Priority date	—
Expiry date	Jan 31, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06V30/274
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.