Patent · US Active

Composite locality sensitive hash based processing of documents

US8244767B2 · kind B2 · utility

6Cited by
9References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 21, 2010
Grant dateAug 14, 2012
Priority date
Expiry dateJan 20, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/325
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Reliable identification of highly similar documents allows such documents to be treated as identical for purposes of document analysis. Identification of highly similar documents can be based on a composite hash value or other value for which the likelihood of two documents having the same value is high if and only if the documents have a high degree of similarity. Prior to performing content based analysis, the composite hash value for the current document is determined and compared to composite hash values of previously analyzed documents. If a match is found, the results of the analysis of the previous document can be applied to the current document. If no match is found, the current document is analyzed.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.