Synthetic crafting of training and test data for named entity recognition by utilizing a rule-based library
US11853699B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 29, 2021 |
| Grant date | Dec 26, 2023 |
| Priority date | — |
| Expiry date | Apr 12, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/951
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system for extracting and labeling Named-Entity Recognition (NER) data in a target language for use in a multi-lingual software module has been developed. First, a textual sentence is translated to the target language using a translation module. A named entity is identified and extracted within the translated sentence. The named entity is identified by either: exact mapping; a semantically similar translated named entity that meets a predetermined minimum threshold of similarity; or utilizing a rule-based library for the target language. Once identified, the named entity is labeled with a pre-determined category and stored in a retrievable electronic database.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.