Patent · US Active

System and method for productive generation of compound words in statistical machine translation

US8781810B2 · kind B2 · utility

8Cited by
9References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 25, 2011
Grant dateJul 15, 2014
Priority date
Expiry dateDec 13, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/44
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and a system for making merging decisions for a translation are disclosed which are suited to use where the target language is a productive compounding one. The method includes outputting decisions on merging of pairs of words in a translated text string with a merging system. The merging system can include a set of stored heuristics and/or a merging model. In the case of heuristics, these can include a heuristic by which two consecutive words in the string are considered for merging if the first word of the two consecutive words is recognized as a compound modifier and their observed frequency f1 as a closed compound word is larger than an observed frequency f2 of the two consecutive words as a bigram. In the case of a merging model, it can be one that is trained on features associated with pairs of consecutive tokens of text strings in a training set and predetermined merging decisions for the pairs. A translation in the target language is output, based on the merging decisions for the translated text string.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.