Multimodal data processing
US12333795B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 15, 2022 |
| Grant date | Jun 17, 2025 |
| Priority date | — |
| Expiry date | Nov 29, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V10/80
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.