Word alignment method and system for improved vocabulary coverage in statistical machine translation
US8612205B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 14, 2010 |
| Grant date | Dec 17, 2013 |
| Priority date | — |
| Expiry date | Jul 16, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/45
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.