System and method for productive generation of compound words in statistical machine translation
US8781810B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 25, 2011 |
| Grant date | Jul 15, 2014 |
| Priority date | — |
| Expiry date | Dec 13, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/44
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and a system for making merging decisions for a translation are disclosed which are suited to use where the target language is a productive compounding one. The method includes outputting decisions on merging of pairs of words in a translated text string with a merging system. The merging system can include a set of stored heuristics and/or a merging model. In the case of heuristics, these can include a heuristic by which two consecutive words in the string are considered for merging if the first word of the two consecutive words is recognized as a compound modifier and their observed frequency f1 as a closed compound word is larger than an observed frequency f2 of the two consecutive words as a bigram. In the case of a merging model, it can be one that is trained on features associated with pairs of consecutive tokens of text strings in a training set and predetermined merging decisions for the pairs. A translation in the target language is output, based on the merging decisions for the translated text string.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.