Patent · US Active

Aligning hierarchal and sequential document trees to identify parallel data

US7805289B2 · kind B2 · utility

10Cited by
5References
13Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 10, 2006
Grant dateSep 28, 2010
Priority date
Expiry dateFeb 14, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/9558
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A set of candidate parallel pages is identified based on trigger words in one or more pages downloaded from a given network location (such as a website). A set of document trees representing each of the candidate pages are aligned to identify translationally parallel content and hyperlinks. The parallel content is further fed into conventional sentence aligner for parallel sentences. And the parallel hyperlinks usually refer to other parallel documents, and lead to a recursive mining of parallel documents.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.