Method for determining the resemining the resemblance of documents
US6230155A · kind A · utility
Assignee
Inventors
Key dates
| Filing date | Nov 23, 1998 |
| Grant date | May 8, 2001 |
| Priority date | — |
| Expiry date | Nov 23, 2018 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99953
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for facilitating the comparison of two computerized documents. The method includes loading a first document into a random access memory (RAM), loading a second document into the RAM, reducing the first document into a first sequence of tokens, reducing the second document into a second sequence of tokens, converting the first set of tokens to a first (multi)set of shingles, converting the second set of tokens to a second (multi)set of shingles, determining a first sketch of the first (multi)set of shingles, determining a second sketch of the second (multi)set of shingles, and comparing the first sketch and the second sketch. The sketches have a fixed size, independent of the size of the documents. The resemblance of two documents is provided using a sketch of each document. The sketches may be computed fairly fast and given two sketches the resemblance of the corresponding documents can be computed in linear time in the size of the sketches.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.