Patent · US Active

Pre-training techniques for entity extraction in low resource domains

US12159109B2 · kind B2 · utility

0Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 12, 2021
Grant dateDec 3, 2024
Priority date
Expiry dateJul 18, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F18/22
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments of the present invention provide systems, methods, and computer storage media for pre-training entity extraction models to facilitate domain adaptation in resource-constrained domains. In an example embodiment, a first machine learning model is used to encode sentences of a source domain corpus and a target domain corpus into sentence embeddings. The sentence embeddings of the target domain corpus are combined into a target corpus embedding. Training sentences from the source domain corpus within a threshold of similarity to the target corpus embedding are selected. A second machine learning model is trained on the training sentences selected from the source domain corpus.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.