Self-supervised multimodal representation learning with cascade positive example mining
US12400449B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 8, 2022 |
| Grant date | Aug 26, 2025 |
| Priority date | — |
| Expiry date | Mar 9, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V10/82
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for model training and deployment includes training, by a processor, a model to learn video representations with a self-supervised contrastive loss by performing progressive training in phases with an incremental number of positive instances from one or more video sequences, resetting the learning rate schedule in each of the phases, and inheriting model weights from a checkpoint from a previous training phase. The method further includes updating the trained model with the self-supervised contrastive loss given multiple positive instances obtained from Cascade K-Nearest Neighbor mining of the one or more video sequences by extracting features in different modalities to compute similarities between the one or more video sequences and selecting a top-k similar instances with features in different modalities. The method also includes fine-tuning the trained model for a downstream task. The method additionally includes deploying the trained model for a target application inference for the downstream task.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.