Patent · US Expired

Method and apparatus for classification of high dimensional data

US6563952B1 · kind B1 · utility

12Cited by

5References

15Claims

0Family size

Assignee

HITACHI AMERICA, LTD. · US

Inventors

Anurag Srivastava · Pune, IN
G.D Ramkumar · Mountain View, US
Vineet Singh · Cupertino, US
Sanjay Ranka · Cupertino, US

Key dates

Filing date	Oct 18, 1999
Grant date	May 13, 2003
Priority date	—
Expiry date	Oct 18, 2019

Classification

Technology area (CPC G)Physics
CPC primaryG06F18/24147
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The present invention is an apparatus and method for classifying high-dimensional sparse datasets. A raw data training set is flattened by converting it from categorical representation to a boolean representation. The flattened data is then used to build a class model on which new data not in the training set may be classified. In one embodiment, the class model takes the form of a decision tree, and large itemsets and cluster information are used as attributes for classification. In another embodiment, the class model is based on the nearest neighbors of the data to be classified. An advantage of the invention is that, by flattening the data, classification accuracy is increased by eliminating artificial ordering induced on the attributes. Another advantage is that the use of large itemsets and clustering increases classification accuracy.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.