Patent · US Active

System and method for supervised contrastive learning for multi-modal tasks

US12183062B2 · kind B2 · utility

0Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 31, 2022
Grant dateDec 31, 2024
Priority date
Expiry dateFeb 18, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V10/82
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.