Method and device for optimizing training set for text classification and storage medium
US11507882B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 25, 2019 |
| Grant date | Nov 22, 2022 |
| Priority date | — |
| Expiry date | Mar 19, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for optimizing a training set for text classification includes: the training set for text classification is acquired; part of samples are selected from the training set as a first initial training subset, and an incorrectly tagged sample in the first initial training subset is corrected to obtain a second initial training subset; a text classification model is trained according to the second initial training subset; the samples in the training set are predicted by the trained text classification model to obtain a prediction result; an incorrectly tagged sample set is generated according to the prediction result; a key incorrectly tagged sample is selected from the incorrectly tagged sample set, and a tag of the key incorrectly tagged sample is corrected to generate a correctly tagged sample corresponding to the key incorrectly tagged sample; and the training set is updated by using the correctly tagged sample.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.