Patent · US Expired

Structure and method for efficient parallel high-dimensional similarity join

US5987468A · kind A · utility

114Cited by

4References

30Claims

0Family size

Assignee

HITACHI AMERICA, LTD. · US

Inventors

Vineet Singh · Cupertino, US
Khaled Alsabti · Gainesville, US
Sanjay Ranka · Cupertino, US

Key dates

Filing date	Dec 12, 1997
Grant date	Nov 16, 1999
Priority date	—
Expiry date	Dec 12, 2017

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99937
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other. Databases in domains such as multimedia and time-series can require a high number of dimensions. The .epsilon.-k-d-B tree has been proposed as a data structure that scales better as number of dimensions increases compared to previous data structures such as the R-tree (and variations), grid-file, and k-d-B tree. We present a cost model of the .epsilon.-k-d-B tree and use it to optimize the leaf size. This new leaf size is shown to be better in most situations compared to previous work that used a constant leaf size. We present novel parallel procedures for the .epsilon.-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted, equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. The weights for the latter strategy are based on the same cost model that is used to determine optimal leaf sizes.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.