Patent · US Active

Handwritten document categorizer and method of training

US8566349B2 · kind B2 · utility

5Cited by
5References
23Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 28, 2009
Grant dateOct 22, 2013
Priority date
Expiry dateAug 27, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.