Automatic labeling of unlabeled text data
US6697998B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 12, 2000 |
| Grant date | Feb 24, 2004 |
| Priority date | — |
| Expiry date | Aug 21, 2022 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/355
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method of automatically labeling of unlabeled text data can be practiced independent of human intervention, but that does not preclude manual intervention. The method can be used to extract relevant features of unlabeled text data for a keyword search. The method of automated labeling of unlabeled text data uses a document collection as a reference answer set. Members of the answer set are converted to vectors representing centroids of unknown groups of unlabeled text data. Unlabeled text data are clustered relative to the centroids by a nearest neighbor algorithm and the ID of the relevant answer is assigned to all documents in the cluster. At this point in the process, a supervised machine learning algorithm is trained on labeled data, and a classifier for assigning labels to new text data is output. Alternatively, a feature extraction algorithm may be run on classes generated by the step of clustering, and search features output which index the unlabeled text data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.