Identifying parallel bilingual data over a network
US8249855B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 7, 2006 |
| Grant date | Aug 21, 2012 |
| Priority date | — |
| Expiry date | Jan 14, 2027 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/951
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A set of candidate documents, each of which may be part of a bilingual, parallel set of documents, are identified. The set of documents illustratively includes textual material in a source language. It is then determined whether parallel text can be identified. For each document in the set of documents, it is first determined whether the parallel text resides within the document itself. If not, the document is examined for links to other documents, and those linked documents are examined for bilingual parallelism with the selected documents. If not, named entities are extracted from the document and translated into the target language. The translations are used to query search engines to retrieve the parallel correspondent for the selected documents.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.