Method for segmenting text words in document images
US8965127B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 14, 2013 |
| Grant date | Feb 24, 2015 |
| Priority date | — |
| Expiry date | Apr 20, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A word segmentation method for processing a document image applies clustering analysis to the spacing segments of a line. The spacing segments are generated by thresholding a one-dimensional vertical projection profile of the line. Taking advantage of the bimodal distribution of spacing length distribution of text lines, a k-means clustering algorithm is used, with the number of clusters pre-set to two, to classify the spacing segments as either character spacing or word spacing. Moreover, k-means++ initialization is used to enhance performance of cluster analysis. The clustering result such as cluster centers and compactness is used to prune single-word text line, single table item, etc. The locations of the word spacing segments are then used to segment the line of text into words.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.