Rapid language detection for characters in images of documents
US11995400B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 7, 2021 |
| Grant date | May 28, 2024 |
| Priority date | — |
| Expiry date | Oct 19, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/19
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer-implemented method, according to one embodiment, includes: receiving an image having characters that correspond to a language, and using a text recognition algorithm to determine a first language believed to correspond to the characters. A first confidence level associated with the first language is also computed, and a determination is made as to whether the first confidence level associated with the first language is outside a predetermined range. In response to determining that the first confidence level associated with the first language is not outside the predetermined range, the first language is output as the given language. The text recognition algorithm is trained using a simple shallow neural network and a generated mixed language corpus. The generated mixed language corpus is formed by: randomly sampling libraries having vocabulary and/or characters therein, and combining the randomly sampled vocabulary and/or characters to form the generated mixed language corpus.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.