Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
US8943080B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 5, 2006 |
| Grant date | Jan 27, 2015 |
| Priority date | — |
| Expiry date | Jun 13, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/45
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems, computer programs, and methods for identifying parallel documents and/or fragments in a bilingual collection are provided. The method for identifying parallel sub-sentential fragments in a bilingual collection comprises translating a source document from a bilingual collection. The method further includes querying a target library associated with the bilingual collection using the translated source document, and identifying one or more target documents based on the query. Subsequently, a source sentence associated with the source document is aligned to one or more target sentences associated with the one or more target documents. Finally, the method includes determining whether a source fragment associated with the source sentence comprises a parallel translation of a target fragment associated with the one or more target sentences.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.