Patent · US Active

Text classification using automatically generated seed data

US10671812B2 · kind B2 · utility

14Cited by

8References

20Claims

0Family size

Assignee

EQUIFAX INC. · US

Inventors

Rajkumar BONDUGULA · Marietta, US
Allan Joshua · Atlanta, US
Hongchao Li · Dongguan, CN
Hannah Wang · Duluth, US

Key dates

Filing date	Mar 22, 2018
Grant date	Jun 2, 2020
Priority date	—
Expiry date	Mar 22, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Certain aspects produce a scoring model that can automatically classify future text samples. In some examples, a processing device perform operations for producing a scoring model using active learning. The operations includes receiving existing text samples and searching a stored, pre-trained corpus defining embedding vectors for selected words, phrases, or documents to produce nearest neighbor vectors for each embedding vector. Nearest neighbor selections are identified based on distance between each nearest neighbor vector and the embedding vector for each selection to produce a text cloud. Text samples are selected from the text cloud to produce seed data that is used to train a text classifier. A scoring model can be produced based on the text classifier. The scoring model can receive a plurality of new text samples and provide a score indicative of a likelihood of being a member of a selected class.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.