Patent · US Active

Preprocessing images for OCR using character pixel height estimation and cycle generative adversarial networks for better character recognition

US11176410B2 · kind B2 · utility

1Cited by
3References
10Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 27, 2019
Grant dateNov 16, 2021
Priority date
Expiry dateOct 27, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A text extraction computing method that comprises calculating an estimated character pixel height of text from a digital image. The method may scale the digital image using the estimated character pixel height and a preferred character pixel height. The method may binarizes the digital image. The method may remove distortions using a neural network trained by a cycle GAN on a set of source text images and a set of clean text images. The set of source text images and clean text images are unpaired. The source text images may be distorted images of text. Calculating the estimated character pixel height may include summarizing the rows of pixels into a horizontal projection, and determining a line-repetition period from the projection, and quantifying the portion of the line-repetition period that corresponds to the text as the estimated character pixel height. The method may extract characters from the digital image using OCR.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.