Handwritten document categorizer and method of training
US8566349B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 28, 2009 |
| Grant date | Oct 22, 2013 |
| Priority date | — |
| Expiry date | Aug 27, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.