Method and system for optical character recognition using image clustering
US8208726B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 22, 2010 |
| Grant date | Jun 26, 2012 |
| Priority date | — |
| Expiry date | Feb 1, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/414
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present disclosure provides a computer-implemented method of translating an image-based electronic document into a text-based electronic document. The method includes electronically scanning an image-based document to determine positions of word images in the image-based document. The method also includes extracting the word images from the image-based document and storing the word images to an electronic storage device. The method also includes grouping a subset of the word images into a word cluster based on a similarity of the word images, wherein the word images in the word cluster correspond to a same actual word. The method also includes generating a character-encoded transcription for the word cluster based on the word images in the word cluster. The method also includes adding the character-encoded transcription to a text-based electronic document at locations corresponding to the positions of the word images in the image-based document.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.