Searching in multilevel clustered vector-based data
US11449704B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 16, 2020 |
| Grant date | Sep 20, 2022 |
| Priority date | — |
| Expiry date | Oct 18, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V10/771
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A multilevel clustered data set for multidimensional vectors is created by defining a plurality of clusters based on each of the signed dimensions of the vectors, each dimension functioning as an axis. Vectors are assigned to each cluster by measuring cosine similarity between a vector and each axis. Sub-clusters are defined as ranges of cosine similarity values within a cluster, and each vector is assigned into the appropriate range based on their cosine similarity value with the axis of the cluster. Searching for a matching vector to a new vector is efficiently achieved in near-constant time by measuring cosine similarity for the new vector with each axis to identify the closest cluster, reusing the cosine similarity of the new vector and axis to determine which sub-cluster corresponds to the appropriate range of values, and then comparing each vector within the sub-cluster until a match is found or ruled out.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.