Text-based framework for video object selection
US12266181B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 19, 2021 |
| Grant date | Apr 1, 2025 |
| Priority date | — |
| Expiry date | Nov 28, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V20/46
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Embodiments are disclosed for receiving a user input and an input video comprising multiple frames. The method may include extracting a text feature from the user input. The method may further include extracting a plurality of image features from the frames. The method may further include identifying one or more keyframes from the frames that include the object. The method may further include clustering one or more groups of the one or more keyframes. The method may further include generating a plurality of segmentation masks for each group. The method may further include determining a set of reference masks corresponding to the user input and the object. The method may further include generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks. The method may further include propagating the set of fusion masks and outputting a final set of masks.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.