Patent · US Active

Text-based framework for video object selection

US12266181B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 19, 2021
Grant dateApr 1, 2025
Priority date
Expiry dateNov 28, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V20/46
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.