Patent · US Expired

Document similarity detection

US7734627B1 · kind B1 · utility

61Cited by
30References
20Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJun 17, 2003
Grant dateJun 8, 2010
Priority date
Expiry dateJul 20, 2025

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/319
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A similarity detector detects similar or near duplicate occurrences of a document. The similarity detector determines similarity of documents by characterizing the documents as clusters each made up of a set of term entries, such as pairs of terms. A pair of terms, for example, indicates that the first term of the pair occurs before the second term of the pair in the underlying document. Another document that has a threshold level of term entries in common with a cluster is considered similar to the document characterized by the cluster.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.