Patent · US Active

Leveraging unsupervised meta-learning to boost few-shot action recognition

US12087043B2 · kind B2 · utility

1Cited by

0References

20Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Gaurav Mittal · Pittsburgh, US
Ye Yu · Redmond, US
Mei CHEN · Bellevue, US
Jay Sanjay Patravali · Corvallis, US

Key dates

Filing date	Nov 24, 2021
Grant date	Sep 10, 2024
Priority date	—
Expiry date	Oct 15, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/084
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.