Physical page layout analysis via tab-stop detection for optical character recognition
US8249356B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Jan 21, 2009 |
| Grant date | Aug 21, 2012 |
| Priority date | — |
| Expiry date | Jun 2, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/414
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Physical page layout analysis for optical character recognition is performed. A physical page layout analysis method finds constituent parts of an image and gives an initial data-type label, such as text or non-text. Within the text data, connected components are identified and analyzed. Tab-stops are detected from groups of edge-aligned connected components. The detected tab-stops are used to deduce the column layout of the page by finding column partitions. The column layout is then applied to find the polygonal boundaries of and a reading order of regions containing flowing text, headings, and pull-outs.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.