Document similarity detection
US7734627B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Jun 17, 2003 |
| Grant date | Jun 8, 2010 |
| Priority date | — |
| Expiry date | Jul 20, 2025 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/319
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A similarity detector detects similar or near duplicate occurrences of a document. The similarity detector determines similarity of documents by characterizing the documents as clusters each made up of a set of term entries, such as pairs of terms. A pair of terms, for example, indicates that the first term of the pair occurs before the second term of the pair in the underlying document. Another document that has a threshold level of term entries in common with a cluster is considered similar to the document characterized by the cluster.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.