Patent · US Active

Text categorization with knowledge transfer from heterogeneous datasets

US8103671B2 · kind B2 · utility

8Cited by
22References
24Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 10, 2008
Grant dateJan 24, 2012
Priority date
Expiry dateJun 15, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification. Heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. Features are extracted from each of the heterogeneous auxiliary datasets. The features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.