Systems and methods for handling and distinguishing binarized, background artifacts in the vicinity of document text and image features indicative of a document category
US8538184B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 6, 2008 |
| Grant date | Sep 17, 2013 |
| Priority date | — |
| Expiry date | May 25, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/1985
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.