Patent · US Active

Hashing techniques for data set similarity determination

US9311403B1 · kind B1 · utility

13Cited by
2References
13Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJun 16, 2011
Grant dateApr 12, 2016
Priority date
Expiry dateDec 30, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/5838
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods, systems and computer program product embodiments for hashing techniques for determining similarity between data sets are described herein. A method embodiment includes, initializing a random number generator with a weighted min-hash value as a seed, wherein the weighted min-hash value approximates a similarity distance between data sets. A number of bits in the weighted min-hash value is determined by uniformly sampling an integer bit value using the random number generator. A system embodiment includes a repository configured to store a plurality of data sets and a hash generator configured to generate weighted min-hash values from the data sets. The system further includes a similarity determiner configured to determine a similarity between the data sets.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.