Patent · US Active

Multi-modal model training method and apparatus, image recognition method and apparatus, and electronic device

US12260629B2 · kind B2 · utility

0Cited by

0References

18Claims

0Family size

Assignee

SUZHOU METABRAIN INTELLIGENT TECHNOLOGY CO., LTD. · CN

Inventors

Chong SHEN · San Jose, US
Feng Li · Beijing, CN

Key dates

Filing date	Sep 28, 2022
Grant date	Mar 25, 2025
Priority date	—
Expiry date	Sep 28, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG06V10/86
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The present application relates to the field of image recognition. Disclosed are a multi-modal model training method and apparatus, an image recognition method and apparatus, and an electronic device. The method comprises: acquiring a sample image and a text feature vector corresponding to the sample image; inputting the sample image into a feature extraction network of an initial multi-modal model, so as to generate an image feature vector corresponding to the sample image; inputting the text feature vector and the image feature vector into a transformer structure of the initial multi-modal model, and outputting candidate texts corresponding to the sample image; and updating parameters of the initial multi-modal model according to a target text corresponding to the text feature vector, and the candidate texts, so as to determine a target multi-modal model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.