Patent · US Active

Feature reweighting in text classifier generation using unlabeled data

US11216619B2 · kind B2 · utility

5Cited by
5References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 28, 2020
Grant dateJan 4, 2022
Priority date
Expiry dateMay 2, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.