Patent · US Active

Rapid language detection for characters in images of documents

US11995400B2 · kind B2 · utility

0Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 7, 2021
Grant dateMay 28, 2024
Priority date
Expiry dateOct 19, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/19
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer-implemented method, according to one embodiment, includes: receiving an image having characters that correspond to a language, and using a text recognition algorithm to determine a first language believed to correspond to the characters. A first confidence level associated with the first language is also computed, and a determination is made as to whether the first confidence level associated with the first language is outside a predetermined range. In response to determining that the first confidence level associated with the first language is not outside the predetermined range, the first language is output as the given language. The text recognition algorithm is trained using a simple shallow neural network and a generated mixed language corpus. The generated mixed language corpus is formed by: randomly sampling libraries having vocabulary and/or characters therein, and combining the randomly sampled vocabulary and/or characters to form the generated mixed language corpus.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.