Patent · US Active

Finding duplicate passages of text in a collection of text

US10585975B2 · kind B2 · utility

0Cited by

2References

19Claims

0Family size

Assignee

GITHUB SOFTWARE UK LTD. · GB

Inventor

Julian Tibble · Felton, GB

Key dates

Filing date	Mar 2, 2012
Grant date	Mar 10, 2020
Priority date	—
Expiry date	Apr 25, 2034

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/194
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A novel system and computer-implemented method for quickly and efficiently finding and reporting all clones with a large corpus of text. This is achieved by tokenizing the corpus, computing a rolling hash, filtering for hashes that occur more than once, and constructing an equivalence relation over these hashes in which hashes are equated if they are part of the same instance of duplication. The equivalence relation is then used to report all detected clones.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.