Patent · US Expired

Method for determining the resemining the resemblance of documents

US6230155A · kind A · utility

50Cited by
5References
1Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 23, 1998
Grant dateMay 8, 2001
Priority date
Expiry dateNov 23, 2018

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99953
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for facilitating the comparison of two computerized documents. The method includes loading a first document into a random access memory (RAM), loading a second document into the RAM, reducing the first document into a first sequence of tokens, reducing the second document into a second sequence of tokens, converting the first set of tokens to a first (multi)set of shingles, converting the second set of tokens to a second (multi)set of shingles, determining a first sketch of the first (multi)set of shingles, determining a second sketch of the second (multi)set of shingles, and comparing the first sketch and the second sketch. The sketches have a fixed size, independent of the size of the documents. The resemblance of two documents is provided using a sketch of each document. The sketches may be computed fairly fast and given two sketches the resemblance of the corresponding documents can be computed in linear time in the size of the sketches.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.