Patent · US Active

Method and device for optimizing training set for text classification and storage medium

US11507882B2 · kind B2 · utility

1Cited by

1References

18Claims

0Family size

Assignee

Beijing Xiaomi Intelligent Technology Co., Ltd. · CN

Inventors

Hongxu Ji · Beijing, CN
Qun Guo · Bellevue, US
Xiao Lu · Beijing, CN
Erli Meng · Beijing, CN

Key dates

Filing date	Nov 25, 2019
Grant date	Nov 22, 2022
Priority date	—
Expiry date	Mar 19, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method for optimizing a training set for text classification includes: the training set for text classification is acquired; part of samples are selected from the training set as a first initial training subset, and an incorrectly tagged sample in the first initial training subset is corrected to obtain a second initial training subset; a text classification model is trained according to the second initial training subset; the samples in the training set are predicted by the trained text classification model to obtain a prediction result; an incorrectly tagged sample set is generated according to the prediction result; a key incorrectly tagged sample is selected from the incorrectly tagged sample set, and a tag of the key incorrectly tagged sample is corrected to generate a correctly tagged sample corresponding to the key incorrectly tagged sample; and the training set is updated by using the correctly tagged sample.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.