System and method for natural language processing using synthetic text
US10025773B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 24, 2015 |
| Grant date | Jul 17, 2018 |
| Priority date | — |
| Expiry date | Sep 16, 2035 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/56
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for performing natural language processing includes receiving a primary text file. The received primary text file is scanned to determine a set of statistics related to a frequency at which various words of the primary text file follow other words of the primary text file. A probabilistic word generator is created based on the determined set of statistics. The probabilistic word generator generates synthetic text exhibiting the determined set of statistics. Synthetic text exhibiting the determined set of statistics is generated using the created probabilistic word generator. Word vectorization is performed on the synthetic text. Results of the performed vectorization are used to perform machine learning tasks.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.