Structure and method for efficient parallel high-dimensional similarity join
US5987468A · kind A · utility
Assignee
Inventors
Key dates
| Filing date | Dec 12, 1997 |
| Grant date | Nov 16, 1999 |
| Priority date | — |
| Expiry date | Dec 12, 2017 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99937
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other. Databases in domains such as multimedia and time-series can require a high number of dimensions. The .epsilon.-k-d-B tree has been proposed as a data structure that scales better as number of dimensions increases compared to previous data structures such as the R-tree (and variations), grid-file, and k-d-B tree. We present a cost model of the .epsilon.-k-d-B tree and use it to optimize the leaf size. This new leaf size is shown to be better in most situations compared to previous work that used a constant leaf size. We present novel parallel procedures for the .epsilon.-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted, equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. The weights for the latter strategy are based on the same cost model that is used to determine optimal leaf sizes.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.