Patent · US Active

Similar document detection and electronic discovery

US9418144B2 · kind B2 · utility

4Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 24, 2015
Grant dateAug 16, 2016
Priority date
Expiry dateSep 24, 2035

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/9535
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods are disclosed for performing duplicate document analyses to identify texturally identical or similar documents, which may be electronic documents stored within an electronic discovery platform. A process is described which includes representing each of the documents, including a target document, as a relatively large n-tuple vector and also as a relatively small m-tuple vector, performing a series of calculations on the set of m-tuple vectors to identify a set of documents which are candidate near-duplicates to the target document, and then filtering the candidate set of near-duplicate documents based upon the distance of their n-tuple vectors from the n-tuple vector of the target document.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.