Patent · US Active

Multimodal data processing

US12333795B2 · kind B2 · utility

0Cited by
0References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 15, 2022
Grant dateJun 17, 2025
Priority date
Expiry dateNov 29, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V10/80
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.