Patent · US Active

Rapid language detection for characters in images of documents

US11995400B2 · kind B2 · utility

0Cited by

2References

20Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Zhong Fang Yuan · Xi'an, CN
Tong Liu · San Diego, US
Li Gao · Katy, US
Xiang Yang · Santa Clara, US
Qiang He · Chongqing, CN
Yu Pan · Shanghai, CN

Key dates

Filing date	Sep 7, 2021
Grant date	May 28, 2024
Priority date	—
Expiry date	Oct 19, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG06V30/19
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A computer-implemented method, according to one embodiment, includes: receiving an image having characters that correspond to a language, and using a text recognition algorithm to determine a first language believed to correspond to the characters. A first confidence level associated with the first language is also computed, and a determination is made as to whether the first confidence level associated with the first language is outside a predetermined range. In response to determining that the first confidence level associated with the first language is not outside the predetermined range, the first language is output as the given language. The text recognition algorithm is trained using a simple shallow neural network and a generated mixed language corpus. The generated mixed language corpus is formed by: randomly sampling libraries having vocabulary and/or characters therein, and combining the randomly sampled vocabulary and/or characters to form the generated mixed language corpus.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.