System and method for supervised contrastive learning for multi-modal tasks
US12183062B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 31, 2022 |
| Grant date | Dec 31, 2024 |
| Priority date | — |
| Expiry date | Feb 18, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V10/82
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.