Patent · US Active

Method for segmenting text words in document images

US8965127B2 · kind B2 · utility

9Cited by
2References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 14, 2013
Grant dateFeb 24, 2015
Priority date
Expiry dateApr 20, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A word segmentation method for processing a document image applies clustering analysis to the spacing segments of a line. The spacing segments are generated by thresholding a one-dimensional vertical projection profile of the line. Taking advantage of the bimodal distribution of spacing length distribution of text lines, a k-means clustering algorithm is used, with the number of clusters pre-set to two, to classify the spacing segments as either character spacing or word spacing. Moreover, k-means++ initialization is used to enhance performance of cluster analysis. The clustering result such as cluster centers and compactness is used to prune single-word text line, single table item, etc. The locations of the word spacing segments are then used to segment the line of text into words.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.