Composite locality sensitive hash based processing of documents
US8244767B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 21, 2010 |
| Grant date | Aug 14, 2012 |
| Priority date | — |
| Expiry date | Jan 20, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/325
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Reliable identification of highly similar documents allows such documents to be treated as identical for purposes of document analysis. Identification of highly similar documents can be based on a composite hash value or other value for which the likelihood of two documents having the same value is high if and only if the documents have a high degree of similarity. Prior to performing content based analysis, the composite hash value for the current document is determined and compared to composite hash values of previously analyzed documents. If a match is found, the results of the analysis of the previous document can be applied to the current document. If no match is found, the current document is analyzed.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.