Machine translation method for PDF file
US8108202B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 29, 2008 |
| Grant date | Jan 31, 2012 |
| Priority date | — |
| Expiry date | Nov 7, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/143
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Disclosed is a machine translation method for a PDF file. A machine translation device extracts source language text and non-text from the input source language PDF file through image transformation, corrects the extracted source language text by using the source language text extracted from text information, restores a part that is contextually separated by the non-text from among the extracted source language text, generates a source language XML/HTML file by rearranging the extracted text and non-text so as to satisfy the contextual flow of the source language PDF file, separates source language text from a tag of the source language XML/HTML file, generates target language text by using translation knowledge and a transformation engine specified for the technical field corresponding to the source language PDF file, inserts the translated target language text other than source language text into XML/HTML file, and transforms the generated target language XML/HTML file into a target language PDF file to be output.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.