Patent · US Active

Consistent weighted sampling of multisets and distributions

US7716144B2 · kind B2 · utility

3Cited by
8References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 22, 2007
Grant dateMay 11, 2010
Priority date
Expiry dateSep 4, 2028

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/194
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques are provided that identify near-duplicate items in large collections of items. A list of (value, frequency) pairs is received, and a sample (value, instance) is returned. The value is chosen from the values of the first list, and the instance is a value less than frequency, in such a way that the probability of selecting the same sample from two lists is equal to the similarity of the two lists.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.