Patent · US Active

Physical page layout analysis via tab-stop detection for optical character recognition

US8249356B1 · kind B1 · utility

16Cited by
0References
26Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJan 21, 2009
Grant dateAug 21, 2012
Priority date
Expiry dateJun 2, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/414
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Physical page layout analysis for optical character recognition is performed. A physical page layout analysis method finds constituent parts of an image and gives an initial data-type label, such as text or non-text. Within the text data, connected components are identified and analyzed. Tab-stops are detected from groups of edge-aligned connected components. The detected tab-stops are used to deduce the column layout of the page by finding column partitions. The column layout is then applied to find the polygonal boundaries of and a reading order of regions containing flowing text, headings, and pull-outs.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.