Aligning hierarchal and sequential document trees to identify parallel data
US7805289B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 10, 2006 |
| Grant date | Sep 28, 2010 |
| Priority date | — |
| Expiry date | Feb 14, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/9558
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A set of candidate parallel pages is identified based on trigger words in one or more pages downloaded from a given network location (such as a website). A set of document trees representing each of the candidate pages are aligned to identify translationally parallel content and hyperlinks. The parallel content is further fed into conventional sentence aligner for parallel sentences. And the parallel hyperlinks usually refer to other parallel documents, and lead to a recursive mining of parallel documents.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.