Patent · US Active

Multi-modal model training method and apparatus, image recognition method and apparatus, and electronic device

US12260629B2 · kind B2 · utility

0Cited by
0References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 28, 2022
Grant dateMar 25, 2025
Priority date
Expiry dateSep 28, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V10/86
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present application relates to the field of image recognition. Disclosed are a multi-modal model training method and apparatus, an image recognition method and apparatus, and an electronic device. The method comprises: acquiring a sample image and a text feature vector corresponding to the sample image; inputting the sample image into a feature extraction network of an initial multi-modal model, so as to generate an image feature vector corresponding to the sample image; inputting the text feature vector and the image feature vector into a transformer structure of the initial multi-modal model, and outputting candidate texts corresponding to the sample image; and updating parameters of the initial multi-modal model according to a target text corresponding to the text feature vector, and the candidate texts, so as to determine a target multi-modal model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.