Patent · US Active

Method for building parallel corpora

US7949514B2 · kind B2 · utility

4Cited by
3References
22Claims
0Family size

Assignee

Inventor

Key dates

Filing dateApr 20, 2007
Grant dateMay 24, 2011
Priority date
Expiry dateMar 23, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/45
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for identifying documents for enriching a statistical translation tool includes retrieving a source document which is responsive to a source language query that may be specific to a selected domain. A set of text segments is extracted from the retrieved source document and translated into corresponding target language segments with a statistical translation tool to be enriched. Target language queries based on the target language segments are formulated. Sets of target documents responsive to the target language queries are retrieved. The sets of retrieved target documents are filtered, including identifying any candidate documents which meet a selection criterion that is based on co-occurrence of a document in a plurality of the sets. The candidate documents, where found, are compared with the retrieved source document for determining whether any of the candidate documents match the source document. Matching documents can then be stored and used at their turn in a training phase for enriching the translation tool.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.