Patent · US Expired

Statistical method and apparatus for learning translation relationships among words

US7191115B2 · kind B2 · utility

77Cited by

5References

7Claims

0Family size

Assignee

Microsoft Corporation · US

Inventor

Robert C. Moore · Mercer Island, US

Key dates

Filing date	Jun 17, 2002
Grant date	Mar 13, 2007
Priority date	—
Expiry date	Jul 21, 2024

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/40
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A parallel bilingual training corpus is parsed into its content words. Word association scores for each pair of content words consisting of a word of language L1 that occurs in a sentence aligned in the bilingual corpus to a sentence of language L2 in which the other word occurs. A pair of words is considered “linked” in a pair of aligned sentences if one of the words is the most highly associated, of all the words in its sentence, with the other word. The occurrence of compounds is hypothesized in the training data by identifying maximal, connected sets of linked words in each pair of aligned sentences in the processed and scored training data. Whenever one of these maximal, connected sets contains more than one word in either or both of the languages, the subset of the words in that language is hypothesized as a compound.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.