Patent · US Active

Learning thematic similarity metric from article text units

US10831793B2 · kind B2 · utility

1Cited by
3References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 23, 2018
Grant dateNov 10, 2020
Priority date
Expiry dateFeb 14, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/044
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of estimating a thematic similarity of sentences, comprising receiving a corpus of a plurality of documents describing a plurality of topics where each document comprises a plurality of sentences arranged in a plurality of sections, constructing sentence triplets for at least some of the sentences, each sentence triplet comprising a respective sentence, a respective positive sentence selected randomly from the section comprising the respective sentence and a respective negative sentence selected randomly from another section, training a first neural network with the sentence triplets to identify sentence-sentence vectors mapping each sentence with a shorter distance to its respective positive sentence compared to the distance to its respective negative sentence and outputting the first neural network for estimating thematic similarity between a pair of sentences by computing a distance between the sentence-sentence vectors produced for each sentence of the pair by the first neural network.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.