Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
US8296127B2 · kind B2 · utility
60Cited by
190References
29Claims
0Family size
Assignee
Inventors
Key dates
| Filing date | Mar 22, 2005 |
| Grant date | Oct 23, 2012 |
| Priority date | — |
| Expiry date | Dec 16, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/42
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A translation training device which extracts from two nonparallel Corpora a set of parallel sentences. The system finds parameters between different sentences or phrases, in order to find parallel sentences. The parallel sentences are then used for training a data-driven machine translation system. The process can be applied repetitively until sufficient data is collected or until the performance of the translation system stops improving.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.