Patent · US Active

Unified pretraining framework for document understanding

US12333845B2 · kind B2 · utility

0Cited by

1References

15Claims

0Family size

Assignee

Adobe Inc. · US

Inventors

Jiuxiang Gu · College Park, US
Ani Nenkova · Philadelphia, US
Nikolaos Barmpalios · Palo Alto, US
Vlad Ion Morariu · Potomac, US
Tong Sun · Hangzhou City, CN
Rajiv Jain · Falls Church, US
Jason Wen Yong Kuen · Singapore, SG
Handong Zhao · San Jose, US

Key dates

Filing date	Nov 16, 2021
Grant date	Jun 17, 2025
Priority date	—
Expiry date	May 31, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06V30/147
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.