Patent · US Active

Method and system for assessing similarity of documents

US9852337B1 · kind B1 · utility

13Cited by
4References
30Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 30, 2015
Grant dateDec 26, 2017
Priority date
Expiry dateDec 23, 2035

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V2201/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for assessing similarity of documents. The method includes extracting a reference document text from a reference document, extracting an archived document text from an archived document, and quantifying the reference document and the archived document. Quantifying the reference and archived documents includes tokenizing sentences of the reference document and archived document, respectively, and vectorizing the tokenized sentences to obtain a reference document text vector and an archived document text vector for each sentence of the reference and archived document, respectively. The method also includes determining a document similarity value of the quantified reference document and the quantified archived document. Determining the document similarity value includes calculating a set of vector similarity values for a set of combinations of a reference document text vector and an archived document text vector, and calculating the document similarity value, including a sum of the plurality of vector similarity values.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.