Patent · US Active

Generating overlap estimations between high-volume digital data sets based on multiple sketch vector similarity estimators

US11449523B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 5, 2020
Grant dateSep 20, 2022
Priority date
Expiry dateNov 28, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06T11/206
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present disclosure relates to systems, methods, and non-transitory computer-readable media that estimate the overlap between sets of data samples. In particular, in one or more embodiments, the disclosed systems utilize a sketch-based sampling routine and a flexible, accurate estimator to determine the overlap (e.g., the intersection) between sets of data samples. For example, in some implementations, the disclosed systems generate a sketch vector—such as a one permutation hashing vector—for each set of data samples. The disclosed systems further compare the sketch vectors to determine an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator. The disclosed systems utilize one or more of the determined similarity estimators in generating an overlap estimation for the sets of data samples.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.