Patent · US Active

Method and system for optical character recognition using image clustering

US8208726B2 · kind B2 · utility

9Cited by
6References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 22, 2010
Grant dateJun 26, 2012
Priority date
Expiry dateFeb 1, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/414
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present disclosure provides a computer-implemented method of translating an image-based electronic document into a text-based electronic document. The method includes electronically scanning an image-based document to determine positions of word images in the image-based document. The method also includes extracting the word images from the image-based document and storing the word images to an electronic storage device. The method also includes grouping a subset of the word images into a word cluster based on a similarity of the word images, wherein the word images in the word cluster correspond to a same actual word. The method also includes generating a character-encoded transcription for the word cluster based on the word images in the word cluster. The method also includes adding the character-encoded transcription to a text-based electronic document at locations corresponding to the positions of the word images in the image-based document.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.