Pre-training techniques for entity extraction in low resource domains
US12159109B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 12, 2021 |
| Grant date | Dec 3, 2024 |
| Priority date | — |
| Expiry date | Jul 18, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F18/22
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Embodiments of the present invention provide systems, methods, and computer storage media for pre-training entity extraction models to facilitate domain adaptation in resource-constrained domains. In an example embodiment, a first machine learning model is used to encode sentences of a source domain corpus and a target domain corpus into sentence embeddings. The sentence embeddings of the target domain corpus are combined into a target corpus embedding. Training sentences from the source domain corpus within a threshold of similarity to the target corpus embedding are selected. A second machine learning model is trained on the training sentences selected from the source domain corpus.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.