Generation of text classifier training data
US10853580B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 30, 2019 |
| Grant date | Dec 1, 2020 |
| Priority date | — |
| Expiry date | Oct 30, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/045
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes receiving input designating a term of interest in a document of a document corpus and determining a target context embedding representing a target word group that includes the term of interest and context words located in the document proximate to the term of interest. The method also includes identifying, from among the document corpus, a first candidate word group that is semantically similar to the target word group and a second candidate word group that is semantically dissimilar to the target word group. The method further includes receiving user input identifying at least a portion of the first candidate word group as associated with a first label and identifying at least a portion of the second candidate word group as not associated with the first label. The method also includes generating labeled training data based on the user input to train a text classifier.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.