Finding duplicate passages of text in a collection of text
US10585975B2 · kind B2 · utility
0Cited by
2References
19Claims
0Family size
Assignee
Inventor
Key dates
| Filing date | Mar 2, 2012 |
| Grant date | Mar 10, 2020 |
| Priority date | — |
| Expiry date | Apr 25, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/194
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A novel system and computer-implemented method for quickly and efficiently finding and reporting all clones with a large corpus of text. This is achieved by tokenizing the corpus, computing a rolling hash, filtering for hashes that occur more than once, and constructing an equivalence relation over these hashes in which hashes are equated if they are part of the same instance of duplication. The equivalence relation is then used to report all detected clones.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.