Patent · US Active

Self-supervised multimodal representation learning with cascade positive example mining

US12400449B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 8, 2022
Grant dateAug 26, 2025
Priority date
Expiry dateMar 9, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V10/82
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for model training and deployment includes training, by a processor, a model to learn video representations with a self-supervised contrastive loss by performing progressive training in phases with an incremental number of positive instances from one or more video sequences, resetting the learning rate schedule in each of the phases, and inheriting model weights from a checkpoint from a previous training phase. The method further includes updating the trained model with the self-supervised contrastive loss given multiple positive instances obtained from Cascade K-Nearest Neighbor mining of the one or more video sequences by extracting features in different modalities to compute similarities between the one or more video sequences and selecting a top-k similar instances with features in different modalities. The method also includes fine-tuning the trained model for a downstream task. The method additionally includes deploying the trained model for a target application inference for the downstream task.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.