Patent · US Active

Building a translation lexicon from comparable, non-parallel corpora

US8234106B2 · kind B2 · utility

42Cited by
192References
28Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 8, 2009
Grant dateJul 31, 2012
Priority date
Expiry dateOct 8, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/242
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A machine translation system may use non-parallel monolingual corpora to generate a translation lexicon. The system may identify identically spelled words in the two corpora, and use them as a seed lexicon. The system may use various clues, e.g., context and frequency, to identify and score other possible translation pairs, using the seed lexicon as a basis. An alternative system may use a small bilingual lexicon in addition to non-parallel corpora to learn translations of unknown words and to generate a parallel corpus.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.