Patent · US Active

Document processing employing probabilistic topic modeling of documents represented as text words transformed to a continuous space

US9430563B2 · kind B2 · utility

11Cited by
7References
24Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 2, 2012
Grant dateAug 30, 2016
Priority date
Expiry dateJul 16, 2034

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A set of word embedding transforms are applied to transform text words of a set of documents into K-dimensional word vectors in order to generate sets or sequences of word vectors representing the documents of the set of documents. A probabilistic topic model is learned using the sets or sequences of word vectors representing the documents of the set of documents. The set of word embedding transforms are applied to transform text words of an input document into K-dimensional word vectors in order to generate a set or sequence of word vectors representing the input document. The learned probabilistic topic model is applied to assign probabilities for topics of the probabilistic topic model to the set or sequence of word vectors representing the input document. A document processing operation such as annotation, classification, or similar document retrieval may be performed using the assigned topic probabilities.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.