Patent · US Active

Collecting training data from TeX files

US10824788B2 · kind B2 · utility

0Cited by
2References
15Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 8, 2019
Grant dateNov 3, 2020
Priority date
Expiry dateFeb 8, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of collecting training data of a document component may be provided. The documents have a structure and are coded in the typesetting language TeX. The method comprise receiving a TeX source file, compiling it into a PDF file and a related sync file, analyzing the PDF file, thereby determining a non-text-only document component. The method comprises also determining first coordinates of the non-text-only document component and a corresponding page number, determining a typesetting command relating to a non-text-only document component and determining second coordinates of a bounding box and a corresponding page number from the sync file, determining text elements in the non-text-only document component of the PDF file for which the first coordinates and the second coordinates overlap, and combining the determined text elements and linking them to a type of a non-text document component determined in the non-text-only document component in the TeX source file.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.