Patent · US Active

Line segmentation method applicable to document images containing handwriting and printed text characters or skewed text lines

US9104940B2 · kind B2 · utility

7Cited by
5References
12Claims
0Family size

Assignee

Inventor

Key dates

Filing dateAug 30, 2013
Grant dateAug 11, 2015
Priority date
Expiry dateJan 31, 2034

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A text line segmentation method for a document image containing printed text and handwriting, or document image containing skewed lines or printed text. Connected component (CC) are obtained for the document, and their bounding boxes and centroids are calculated. The CCs are categorized into three categories based on bounding box sizes: small objects, regular text objects, and large objects involving handwriting. The centroids of regular text objects are used in a cluster analysis to find the vertical centers of the N text lines. Then, each CC is classified into one of the N lines based on the vertical distance between its centroid and the vertical centers of text lines, and copied into to a corresponding object board. Extra spaces are removed from the object boards to obtain the line segments. The large object involving handwriting will be classified into one of the lines but absent from other lines.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.