Multi-domain machine translation model adaptation
US9235567B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 14, 2013 |
| Grant date | Jan 12, 2016 |
| Priority date | — |
| Expiry date | Jan 20, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/44
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method adapted to multiple corpora includes training a statistical machine translation model which outputs a score for a candidate translation, in a target language, of a text string in a source language. The training includes learning a weight for each of a set of lexical coverage features that are aggregated in the statistical machine translation model. The lexical coverage features include a lexical coverage feature for each of a plurality of parallel corpora. Each of the lexical coverage features represents a relative number of words of the text string for which the respective parallel corpus contributed a biphrase to the candidate translation. The method may also include learning a weight for each of a plurality of language model features, the language model features comprising one language model feature for each of the domains.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.