Patent · US Active

Method and device for optimizing training set for text classification and storage medium

US11507882B2 · kind B2 · utility

1Cited by
1References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 25, 2019
Grant dateNov 22, 2022
Priority date
Expiry dateMar 19, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for optimizing a training set for text classification includes: the training set for text classification is acquired; part of samples are selected from the training set as a first initial training subset, and an incorrectly tagged sample in the first initial training subset is corrected to obtain a second initial training subset; a text classification model is trained according to the second initial training subset; the samples in the training set are predicted by the trained text classification model to obtain a prediction result; an incorrectly tagged sample set is generated according to the prediction result; a key incorrectly tagged sample is selected from the incorrectly tagged sample set, and a tag of the key incorrectly tagged sample is corrected to generate a correctly tagged sample corresponding to the key incorrectly tagged sample; and the training set is updated by using the correctly tagged sample.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.