Patent · US Expired

Terminology translation for unaligned comparable corpora using category based translation probabilities

US6885985B2 · kind B2 · utility

58Cited by
15References
20Claims
0Family size

Assignee

Inventor

Key dates

Filing dateDec 18, 2000
Grant dateApr 26, 2005
Priority date
Expiry dateOct 13, 2023

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/49
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The invention relates to a method and apparatus for generating translations of natural language terms from a first language to a second language. A plurality of terms are extracted from unaligned comparable corpora of the first and second languages. Comparable corpora are sets of documents in different languages that come from the same domain and have similar genre and content. Unaligned documents are not translations of one another and are not linked in any other way. By accessing monolingual thesauri of the first and second languages, a category is assigned to each extracted term. Then, category-to-category translation probabilities are estimated, and using said category-to-category translation probabilities, term-to-term translation probabilities are estimated. The invention preferably exploits class-based normalization of probability estimates, bi-directionality, and relative frequency normalization. The most important applications are cross-language text retrieval, semi-automatic bilingual thesaurus enhancement, and machine-aided human translation.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.