Similar document detection and electronic discovery
US9418144B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 24, 2015 |
| Grant date | Aug 16, 2016 |
| Priority date | — |
| Expiry date | Sep 24, 2035 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/9535
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods are disclosed for performing duplicate document analyses to identify texturally identical or similar documents, which may be electronic documents stored within an electronic discovery platform. A process is described which includes representing each of the documents, including a target document, as a relatively large n-tuple vector and also as a relatively small m-tuple vector, performing a series of calculations on the set of m-tuple vectors to identify a set of documents which are candidate near-duplicates to the target document, and then filtering the candidate set of near-duplicate documents based upon the distance of their n-tuple vectors from the n-tuple vector of the target document.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.