Patent · US Active

Pre-training techniques for entity extraction in low resource domains

US12159109B2 · kind B2 · utility

0Cited by

2References

20Claims

0Family size

Assignee

Adobe Inc. · US

Inventors

Aniruddha Mahapatra · Sherghati, IN
Sharmila Reddy Nangi · Hyderabad, IN
Aparna Garimella · Hyderabad, IN
Anandha velu Natarajan · Komangalam, IN

Key dates

Filing date	Nov 12, 2021
Grant date	Dec 3, 2024
Priority date	—
Expiry date	Jul 18, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06F18/22
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Embodiments of the present invention provide systems, methods, and computer storage media for pre-training entity extraction models to facilitate domain adaptation in resource-constrained domains. In an example embodiment, a first machine learning model is used to encode sentences of a source domain corpus and a target domain corpus into sentence embeddings. The sentence embeddings of the target domain corpus are combined into a target corpus embedding. Training sentences from the source domain corpus within a threshold of similarity to the target corpus embedding are selected. A second machine learning model is trained on the training sentences selected from the source domain corpus.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.