Patent · US Active

Self-supervised multimodal representation learning with cascade positive example mining

US12400449B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

NEC CORPORATION · JP

Inventors

Farley Lai · Plainsboro, US
Asim Kadav · Mountain View, US
Cheng-En Wu · Hsinchu, TW

Key dates

Filing date	Sep 8, 2022
Grant date	Aug 26, 2025
Priority date	—
Expiry date	Mar 9, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06V10/82
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method for model training and deployment includes training, by a processor, a model to learn video representations with a self-supervised contrastive loss by performing progressive training in phases with an incremental number of positive instances from one or more video sequences, resetting the learning rate schedule in each of the phases, and inheriting model weights from a checkpoint from a previous training phase. The method further includes updating the trained model with the self-supervised contrastive loss given multiple positive instances obtained from Cascade K-Nearest Neighbor mining of the one or more video sequences by extracting features in different modalities to compute similarities between the one or more video sequences and selecting a top-k similar instances with features in different modalities. The method also includes fine-tuning the trained model for a downstream task. The method additionally includes deploying the trained model for a target application inference for the downstream task.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.