Patent · US Active

Disk-based probabilistic set-similarity indexes

US7610283B2 · kind B2 · utility

12Cited by
9References
16Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 12, 2007
Grant dateOct 27, 2009
Priority date
Expiry dateMar 10, 2028

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99937
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Input set indexing for set-similarity lookups. The architecture provides input to an indexing process that enables more efficient lookups for large data sets (e.g., disk-based) without requiring a full scan of the input. A new index structure is provided, the output of which is exact, rather than approximate. The similarity of two sets is specified using a similarity function that maps two sets to a numeric value that represents similarity of the two sets. Threshold-based lookups are addressed where two sets are considered similar if the numeric similarity score is above a threshold. The structure efficiently identifies all input sets within a distance k (e.g., a hamming distance) of the query set. Additional information in the form of frequency of elements (the number of input sets in which an element occurs) is used to improve index performance.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.