Patent · US Active

Text classification using automatically generated seed data

US10671812B2 · kind B2 · utility

14Cited by
8References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 22, 2018
Grant dateJun 2, 2020
Priority date
Expiry dateMar 22, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Certain aspects produce a scoring model that can automatically classify future text samples. In some examples, a processing device perform operations for producing a scoring model using active learning. The operations includes receiving existing text samples and searching a stored, pre-trained corpus defining embedding vectors for selected words, phrases, or documents to produce nearest neighbor vectors for each embedding vector. Nearest neighbor selections are identified based on distance between each nearest neighbor vector and the embedding vector for each selection to produce a text cloud. Text samples are selected from the text cloud to produce seed data that is used to train a text classifier. A scoring model can be produced based on the text classifier. The scoring model can receive a plurality of new text samples and provide a score indicative of a likelihood of being a member of a selected class.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.