Patent · US Active

Method and system for document similarity analysis based on common denominator similarity

US10248626B1 · kind B1 · utility

7Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 29, 2016
Grant dateApr 2, 2019
Priority date
Expiry dateDec 5, 2037

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/93
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for document similarity analysis. The method includes obtaining a document to be archived, and identifying a document category similar to the document to be archived. The similar document category is identified by: identifying a document category that includes indexing terms that are identical to indexing terms in the document to be archived, obtaining term frequency vectors for the identical indexing terms in the document to be archived and in the identified document category, generating normalized term frequency vectors, from the term frequency vectors, calculating a common denominator similarity based on the normalized term frequency vectors and a common denominator, and determining that the document category is similar to the document to be archived based on the common denominator similarity. The method further includes registering the document to be archived in the document category.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.