Building a translation lexicon from comparable, non-parallel corpora
US8234106B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 8, 2009 |
| Grant date | Jul 31, 2012 |
| Priority date | — |
| Expiry date | Oct 8, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/242
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A machine translation system may use non-parallel monolingual corpora to generate a translation lexicon. The system may identify identically spelled words in the two corpora, and use them as a seed lexicon. The system may use various clues, e.g., context and frequency, to identify and score other possible translation pairs, using the seed lexicon as a basis. An alternative system may use a small bilingual lexicon in addition to non-parallel corpora to learn translations of unknown words and to generate a parallel corpus.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.