Feature reweighting in text classifier generation using unlabeled data
US11216619B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 28, 2020 |
| Grant date | Jan 4, 2022 |
| Priority date | — |
| Expiry date | May 2, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.