Detection of diacritics in OCR systems with assignment to the correct text line
US8977057B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Nov 9, 2012 |
| Grant date | Mar 10, 2015 |
| Priority date | — |
| Expiry date | Feb 6, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/293
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method of assigning diacritics in an electronic image using optical character recognition (OCR) is disclosed. In one example, the method comprises analyzing, by a computer system, the electronic image to generate a plurality of bounding blocks associated with text lines within the electronic image. The method further comprises establishing a plurality of bounding boxes for diacritics and base text with the electronic image. The method also comprises determining a distance from a diacritic to a nearest base text character and a nearest text line. The method also comprises evaluating a base box distance and the nearest text line distance to assign the diacritic to a correct text line in the electronic image.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.