Patent · US Active

System and method to generate a labeled dataset for training an entity detection system

US11681944B2 · kind B2 · utility

2Cited by
17References
36Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 9, 2018
Grant dateJun 20, 2023
Priority date
Expiry dateNov 12, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/295
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

“Semi-supervised” machine learning relies on less human input than a supervised algorithm to train a machine learning algorithm to perform entity recognition (NER). Starting with a known entity value or known pattern value for a specific entity type, phrases in a training data corpus are identified that include the known entity value. Context-value patterns are generated to match selected phrases that include the known entity value. One or more context-value patterns may be validated based on human input. The validated patterns identify additional entity values. A subset of the additional entity values may also be validated based on human input. Occurrences of validated entity values may be labeled in the training corpus. Sample phrases from the labeled training dataset may be extracted to form a reduced-size training set for a supervised machine learning model which may be further used in production to label data for any named entity recognition application.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.