Patent · US Active

Method and apparatus for processing text with variations in vocabulary usage

US9251250B2 · kind B2 · utility

4Cited by

6References

12Claims

0Family size

Assignee

Mitsubishi Electric Research Laboratories, Inc. · US

Inventors

John R. Hershey · Winchester, US
Jonathan Le Roux · Somerville, US
Creighton K Heakulani · Cambridge, GB

Key dates

Filing date	Mar 28, 2012
Grant date	Feb 2, 2016
Priority date	—
Expiry date	May 19, 2034

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/30
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Text is processed to construct a model of the text. The text has a shared vocabulary. The text is partitioned into sets and subsets of texts. The usage of the shared vocabulary in two or more sets is different, and the topics of two or more subsets are different. A probabilistic model is defined for the text. The probabilistic model considers each word in the text to be a token having a position and a word value, and the usage of the shared vocabulary, topics, subtopics, and word values for each token in the text are represented using distributions of random variables in the probabilistic model, wherein the random variables are discrete. Parameters are estimated for the model corresponding to the vocabulary usages, the word values, the topics, and the subtopics associated with the words.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.