Language model optimization for in-domain application
US9972311B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 7, 2014 |
| Grant date | May 15, 2018 |
| Priority date | — |
| Expiry date | May 7, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/295
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods are provided for optimizing language models for in-domain applications through an iterative, joint-modeling approach that expresses training material as alternative representations of higher-level tokens, such as named entities and carrier phrases. From a first language model, an in-domain training corpus may be represented as a set of alternative parses of tokens. Statistical information determined from these parsed representations may be used to produce a second (or updated) language model, which is further optimized for the domain. The second language model may be used to determine another alternative parsed representation of the corpus for a next iteration, and the statistical information determined from this representation may be used to produce a third (or further updated) language model. Through each iteration, a language model may be determined that is further optimized for the domain.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.