Patent · US Expired

Classifier tuning based on data similarities

US7089241B1 · kind B1 · utility

120Cited by
5References
27Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 22, 2003
Grant dateAug 8, 2006
Priority date
Expiry dateDec 22, 2023

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99945
  • WIPO fieldDigital communication
  • WIPO sectorElectrical engineering

Abstract

A probabilistic classifier is used to classify data items in a data stream. The probabilistic classifier is trained, and an initial classification threshold is set, using unique training and evaluation data sets (i.e., data sets that do not contain duplicate data items). Unique data sets are used for training and in setting the initial classification threshold so as to prevent the classifier from being improperly biased as a result of similarity rates in the training and evaluation data sets that do not reflect similarity rates encountered during operation. During operation, information regarding the actual similarity rates of data items in the data stream is obtained and used to adjust the classification threshold such that misclassification costs are minimized given the actual similarity rates.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.