Patent · US Active

Pre-training method, image and text retrieval method for a vision and scene text aggregation model, electronic device, and storage medium

US12347158B2 · kind B2 · utility

0Cited by

2References

19Claims

0Family size

Assignee

BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD. · CN

Inventors

Yipeng Sun · Beijing, CN
Mengjun Cheng · Beijing, CN
Longchao WANG · Beijing, CN
Xiongwei Zhu · Beijing, CN
Kun Yao · Tangxia, CN
Junyu Han · Beijing, CN
Jingtuo Liu · Beijing, CN
Errui Ding · Beijing, CN
Jingdong Wang · Beijing, CN
Haifeng Wang · معلمی نژاد, US

Key dates

Filing date	Mar 29, 2023
Grant date	Jul 1, 2025
Priority date	—
Expiry date	Mar 12, 2044

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY02D10/00
WIPO fieldAudio-visual technology
WIPO sectorElectrical engineering

Abstract

A pre-training method for a Vision and Scene Text Aggregation model includes: acquiring a sample image-text pair; extracting a sample scene text from a sample image; inputting a sample text into a text encoding network to obtain a sample text feature; inputting the sample image and an initial sample aggregation feature into a visual encoding subnetwork and inputting the initial sample aggregation feature and the sample scene text into a scene encoding subnetwork to obtain a global image feature of the sample image and a learned sample aggregation feature; and pre-training the Vision and Scene Text Aggregation model according to the sample text feature, the global image feature of the sample image, and the learned sample aggregation feature.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.