Patent · US Expired

Aligning source texts of different natural languages to produce or add to an aligned corpus

US5893134A · kind A · utility

67Cited by
9References
15Claims
0Family size

Assignees

Inventors

Key dates

Filing dateMay 21, 1996
Grant dateApr 6, 1999
Priority date
Expiry dateMay 21, 2016

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/103
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A plurality of source text files are read, representing similar information but in different natural languages. The files have correlated layouts, in that the same layout commands are employed at similar points in the files. Similar text, from the respective files, is aligned by identifying its position between equivalent word processing commands. Preferably, intermediate files are produced in which the word processing (WP) commands are converted into an identifiable form. Aligned text, which differs between the intermediate files whereas WP commands will not differ, is identified by a differential comparison operation, such as a call to DIFF within a UNIX environment.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.