Patent · US Active

Search and retrieval of documents indexed by optical character recognition

US8208765B2 · kind B2 · utility

3Cited by
8References
15Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 10, 2008
Grant dateJun 26, 2012
Priority date
Expiry dateApr 28, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/287
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.