Patent · US Expired

Automatic labeling of unlabeled text data

US6697998B1 · kind B1 · utility

106Cited by
5References
5Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 12, 2000
Grant dateFeb 24, 2004
Priority date
Expiry dateAug 21, 2022

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/355
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of automatically labeling of unlabeled text data can be practiced independent of human intervention, but that does not preclude manual intervention. The method can be used to extract relevant features of unlabeled text data for a keyword search. The method of automated labeling of unlabeled text data uses a document collection as a reference answer set. Members of the answer set are converted to vectors representing centroids of unknown groups of unlabeled text data. Unlabeled text data are clustered relative to the centroids by a nearest neighbor algorithm and the ID of the relevant answer is assigned to all documents in the cluster. At this point in the process, a supervised machine learning algorithm is trained on labeled data, and a classifier for assigning labels to new text data is output. Alternatively, a feature extraction algorithm may be run on classes generated by the step of clustering, and search features output which index the unlabeled text data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.