Patent · US Expired

Density-based indexing method for efficient execution of high dimensional nearest-neighbor queries on large databases

US6263334A · kind A · utility

253Cited by

7References

39Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

Usama M. Fayyad · Mercer Island, US
Kristin Bennett · Larkspur, US
Dan Geiger · Rekhasim, IL

Key dates

Filing date	Nov 11, 1998
Grant date	Jul 17, 2001
Priority date	—
Expiry date	Nov 11, 2018

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99943
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Method and apparatus for efficiently performing nearest neighbor queries on a database of records wherein each record has a large number of attributes by automatically extracting a multidimensional index from the data. The method is based on first obtaining a statistical model of the content of the data in the form of a probability density function. This density is then used to decide how data should be reorganized on disk for efficient nearest neighbor queries. At query time, the model decides the order in which data should be scanned. It also provides the means for evaluating the probability of correctness of the answer found so far in the partial scan of data determined by the model. In this invention a clustering process is performed on the database to produce multiple data clusters. Each cluster is characterized by a cluster model. The set of clusters represent a probability density function in the form of a mixture model. A new database of records is built having an augmented record format that contains the original record attributes and an additional record attribute containing a cluster number for each record based on the clustering step. The cluster model uses a probabilit…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.