Hashing techniques for data set similarity determination
US9311403B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Jun 16, 2011 |
| Grant date | Apr 12, 2016 |
| Priority date | — |
| Expiry date | Dec 30, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/5838
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems and computer program product embodiments for hashing techniques for determining similarity between data sets are described herein. A method embodiment includes, initializing a random number generator with a weighted min-hash value as a seed, wherein the weighted min-hash value approximates a similarity distance between data sets. A number of bits in the weighted min-hash value is determined by uniformly sampling an integer bit value using the random number generator. A system embodiment includes a repository configured to store a plurality of data sets and a hash generator configured to generate weighted min-hash values from the data sets. The system further includes a similarity determiner configured to determine a similarity between the data sets.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.