Aligning source texts of different natural languages to produce or add to an aligned corpus
US5893134A · kind A · utility
Assignees
Inventors
Key dates
| Filing date | May 21, 1996 |
| Grant date | Apr 6, 1999 |
| Priority date | — |
| Expiry date | May 21, 2016 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/103
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A plurality of source text files are read, representing similar information but in different natural languages. The files have correlated layouts, in that the same layout commands are employed at similar points in the files. Similar text, from the respective files, is aligned by identifying its position between equivalent word processing commands. Preferably, intermediate files are produced in which the word processing (WP) commands are converted into an identifiable form. Aligned text, which differs between the intermediate files whereas WP commands will not differ, is identified by a differential comparison operation, such as a call to DIFF within a UNIX environment.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.