Patent · US Expired

Constructing a translation lexicon from comparable, non-parallel corpora

US7620538B2 · kind B2 · utility

77Cited by
7References
6Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 26, 2003
Grant dateNov 17, 2009
Priority date
Expiry dateMay 27, 2025

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/242
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A machine translation system may use non-parallel monolingual corpora to generate a translation lexicon. The system may identify identically spelled words in the two corpora, and use them as a seed lexicon. The system may use various clues, e.g., context and frequency, to identify and score other possible translation pairs, using the seed lexicon as a basis. An alternative system may use a small bilingual lexicon in addition to non-parallel corpora to learn translations of unknown words and to generate a parallel corpus.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.