Patent · US Active

Estimating document similarity using bit-strings

US8594239B2 · kind B2 · utility

2Cited by
5References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 21, 2011
Grant dateNov 26, 2013
Priority date
Expiry dateFeb 16, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/316
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Each of a plurality of documents is divided into samples. Small bit-strings are generated for selected samples from each of the documents and used to create a sketch for each document. Because the bit-strings are small (e.g., only one, two, or three bits in length), the generated sketches are smaller than the sketches generated using previous methods for generating sketches, and therefore use less storage space. The generated sketches are compared to determine documents that are near-duplicates of one another.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.