Generating training data for machine learning
US10679144B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 12, 2016 |
| Grant date | Jun 9, 2020 |
| Priority date | — |
| Expiry date | Apr 10, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N5/022
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer-implemented method includes receiving a rule, wherein the rule includes at least one token, and receiving at least two dictionaries, wherein the at least two dictionaries include at least one general language dictionary and at least one domain-specific dictionary for a domain. The computer-implemented method further includes, for each of the at least one token, selecting at least one word at random from at least one of the at least two dictionaries and adding the at least one word to a test data line, such that the test data line includes a candidate statement conforming to the rule. The computer-implemented method further includes filtering the candidate statement based on a domain-specific model for the domain and including the candidate statement in training data provided to a machine learning model. A corresponding computer program product and computer system are also disclosed.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.