Patent · US Active

System and method for natural language processing using synthetic text

US10025773B2 · kind B2 · utility

9Cited by
1References
19Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 24, 2015
Grant dateJul 17, 2018
Priority date
Expiry dateSep 16, 2035

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/56
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for performing natural language processing includes receiving a primary text file. The received primary text file is scanned to determine a set of statistics related to a frequency at which various words of the primary text file follow other words of the primary text file. A probabilistic word generator is created based on the determined set of statistics. The probabilistic word generator generates synthetic text exhibiting the determined set of statistics. Synthetic text exhibiting the determined set of statistics is generated using the created probabilistic word generator. Word vectorization is performed on the synthetic text. Results of the performed vectorization are used to perform machine learning tasks.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.