Method and system for document similarity analysis based on common denominator similarity
US10248626B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 29, 2016 |
| Grant date | Apr 2, 2019 |
| Priority date | — |
| Expiry date | Dec 5, 2037 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/93
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for document similarity analysis. The method includes obtaining a document to be archived, and identifying a document category similar to the document to be archived. The similar document category is identified by: identifying a document category that includes indexing terms that are identical to indexing terms in the document to be archived, obtaining term frequency vectors for the identical indexing terms in the document to be archived and in the identified document category, generating normalized term frequency vectors, from the term frequency vectors, calculating a common denominator similarity based on the normalized term frequency vectors and a common denominator, and determining that the document category is similar to the document to be archived based on the common denominator similarity. The method further includes registering the document to be archived in the document category.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.