Generating overlap estimations between high-volume digital data sets based on multiple sketch vector similarity estimators
US11449523B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 5, 2020 |
| Grant date | Sep 20, 2022 |
| Priority date | — |
| Expiry date | Nov 28, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06T11/206
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present disclosure relates to systems, methods, and non-transitory computer-readable media that estimate the overlap between sets of data samples. In particular, in one or more embodiments, the disclosed systems utilize a sketch-based sampling routine and a flexible, accurate estimator to determine the overlap (e.g., the intersection) between sets of data samples. For example, in some implementations, the disclosed systems generate a sketch vector—such as a one permutation hashing vector—for each set of data samples. The disclosed systems further compare the sketch vectors to determine an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator. The disclosed systems utilize one or more of the determined similarity estimators in generating an overlap estimation for the sets of data samples.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.