Method and apparatus for building a language model
US9396724B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 14, 2014 |
| Grant date | Jul 19, 2016 |
| Priority date | — |
| Expiry date | Sep 29, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/197
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes: acquiring data samples; performing categorized sentence mining in the acquired data samples to obtain categorized training samples for multiple categories; building a text classifier based on the categorized training samples; classifying the data samples using the text classifier to obtain a class vocabulary and a corpus for each category; mining the corpus for each category according to the class vocabulary for the category to obtain a respective set of high-frequency language templates; training on the templates for each category to obtain a template-based language model for the category; training on the corpus for each category to obtain a class-based language model for the category; training on the class vocabulary for each category to obtain a lexicon-based language model for the category; building a speech decoder according to an acoustic model, the class-based language model and the lexicon-based language model for any given field, and the data samples.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.