Consistent weighted sampling of multisets and distributions
US7716144B2 · kind B2 · utility
3Cited by
8References
18Claims
0Family size
Assignee
Inventors
Key dates
| Filing date | Mar 22, 2007 |
| Grant date | May 11, 2010 |
| Priority date | — |
| Expiry date | Sep 4, 2028 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/194
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Techniques are provided that identify near-duplicate items in large collections of items. A list of (value, frequency) pairs is received, and a sample (value, instance) is returned. The value is chosen from the values of the first list, and the instance is a value less than frequency, in such a way that the probability of selecting the same sample from two lists is equal to the similarity of the two lists.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.