Patent · US Active

Finding duplicate passages of text in a collection of text

US10585975B2 · kind B2 · utility

0Cited by
2References
19Claims
0Family size

Assignee

Inventor

Key dates

Filing dateMar 2, 2012
Grant dateMar 10, 2020
Priority date
Expiry dateApr 25, 2034

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/194
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A novel system and computer-implemented method for quickly and efficiently finding and reporting all clones with a large corpus of text. This is achieved by tokenizing the corpus, computing a rolling hash, filtering for hashes that occur more than once, and constructing an equivalence relation over these hashes in which hashes are equated if they are part of the same instance of duplication. The equivalence relation is then used to report all detected clones.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.