Precise identification of text pixels from scanned document images
US7873215B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 27, 2007 |
| Grant date | Jan 18, 2011 |
| Priority date | — |
| Expiry date | Nov 17, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system or method for identifying text in a document. A group of connected components is created. A plurality of characteristics of different types is calculated for each connected component. Statistics are computed which describe the group of characteristics. Outlier components are identified as connected components whose computed characteristics are outside a statistical range. The outlier components are removed from the group of connected components. Text pixels are identified by segmenting pixels in the group of connected components into a group of text pixels and a group of background pixels.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.