Patent · US Active

Word alignment method and system for improved vocabulary coverage in statistical machine translation

US8612205B2 · kind B2 · utility

37Cited by
7References
21Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 14, 2010
Grant dateDec 17, 2013
Priority date
Expiry dateJul 16, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/45
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.