Patent · US Active

Natural language selection of objects in image data

US12045288B1 · kind B1 · utility

3Cited by
2References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 24, 2020
Grant dateJul 23, 2024
Priority date
Expiry dateJun 23, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Devices and techniques are generally described for selection of objects in image data using natural language input. In various examples, first image data representing at least a first object and first natural language data may be received. In some examples, first embedding data representing the first natural language data may be generated. Second embedding data representing the first image data may be generated. Relative location data indicating a location of the first object in the first image data relative to at least one other object may be generated. The first embedding data, the second embedding data, and the relative location data may be input into a multi-modal transformer model. The multi-modal transformer model may determine that the first natural language data relates to the first object.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.