Joint heterogeneous language-vision embeddings for video tagging and search
US11409791B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 12, 2017 |
| Grant date | Aug 9, 2022 |
| Priority date | — |
| Expiry date | Oct 29, 2038 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH04N21/8405
- WIPO fieldAudio-visual technology
- WIPO sectorElectrical engineering
Abstract
Systems, methods and articles of manufacture for modeling a joint language-visual space. A textual query to be evaluated relative to a video library is received from a requesting entity. The video library contains a plurality of instances of video content. One or more instances of video content from the video library that correspond to the textual query are determined, by analyzing the textual query using a data model that includes a soft-attention neural network module that is jointly trained with a language Long Short-term Memory (LSTM) neural network module and a video LSTM neural network module. At least an indication of the one or more instances of video content is returned to the requesting entity.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.