Line segmentation method applicable to document images containing handwriting and printed text characters or skewed text lines
US9104940B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Aug 30, 2013 |
| Grant date | Aug 11, 2015 |
| Priority date | — |
| Expiry date | Jan 31, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A text line segmentation method for a document image containing printed text and handwriting, or document image containing skewed lines or printed text. Connected component (CC) are obtained for the document, and their bounding boxes and centroids are calculated. The CCs are categorized into three categories based on bounding box sizes: small objects, regular text objects, and large objects involving handwriting. The centroids of regular text objects are used in a cluster analysis to find the vertical centers of the N text lines. Then, each CC is classified into one of the N lines based on the vertical distance between its centroid and the vertical centers of text lines, and copied into to a corresponding object board. Extra spaces are removed from the object boards to obtain the line segments. The large object involving handwriting will be classified into one of the lines but absent from other lines.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.