Patent · US Active

Generating training data for machine learning

US10679144B2 · kind B2 · utility

0Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 12, 2016
Grant dateJun 9, 2020
Priority date
Expiry dateApr 10, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N5/022
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer-implemented method includes receiving a rule, wherein the rule includes at least one token, and receiving at least two dictionaries, wherein the at least two dictionaries include at least one general language dictionary and at least one domain-specific dictionary for a domain. The computer-implemented method further includes, for each of the at least one token, selecting at least one word at random from at least one of the at least two dictionaries and adding the at least one word to a test data line, such that the test data line includes a candidate statement conforming to the rule. The computer-implemented method further includes filtering the candidate statement based on a domain-specific model for the domain and including the candidate statement in training data provided to a machine learning model. A corresponding computer program product and computer system are also disclosed.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.