Patent · US Active

Identifying training documents for a content classifier

US8352386B2 · kind B2 · utility

8Cited by
1References
24Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 2, 2009
Grant dateJan 8, 2013
Priority date
Expiry dateJun 17, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/353
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems, methods and articles of manufacture are disclosed for identifying a training document for a content classifier. One or more thresholds may be defined for designating a document as a training document for a content classifier. A plurality of documents may be evaluated to compute a score for each respective document. The score may represent suitability of a document for training the content classifier with respect to a category. The score may be computed based on content of the plurality of documents, metadata of the plurality of documents, link structure of the plurality of documents, user feedback (e.g., user supplied document tags) received for the plurality of documents, and document metrics received for the plurality of documents. Based on the computed scores, a training document may be selected. The content classifier may be trained using the selected training document.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.