Removal of graphics from document images using heuristic text analysis and text recovery
US9355311B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Sep 23, 2014 |
| Grant date | May 31, 2016 |
| Priority date | — |
| Expiry date | Sep 23, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06T2207/30176
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A graphic removal process for document images involves two stages: First, removal of graphics in the document image based on heuristic text analyses; and second, text recovery to recover some text that is accidentally removed during the first stage. The first stage uses a relatively aggressive strategy to ensure that all graphics components are removed, which also temporarily leads to the removal of some text; the lost text will then be recovered using the text recovery technique. The heuristic text analyses utilize the geometric properties of text characters and consider the properties of text characters in relation to their neighbors. The text recovery technique starts from the text that remain after the first stage, and recovers any connected component that is at least partially located within a pre-defined neighboring area around any of the text components in the intermediate document image.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.