Patent · US Active

Targeted optical character recognition (OCR) for medical terminology

US9633271B2 · kind B2 · utility

2Cited by
11References
42Claims
0Family size

Assignee

Inventor

Key dates

Filing dateApr 28, 2016
Grant dateApr 25, 2017
Priority date
Expiry dateApr 28, 2036

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06T2207/20112
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments of the present invention provide concepts for correcting optical character recognition (OCR) errors from and OCR scan result by sequentially applying an anagram hash (AH) and Levenshtein-Distance (LD) measurement for concurrent character identity-based (machine code) and character shape-based (OCR-Key) corrections. The OCR-Key classifies characters by shape into one or more disjoint and overlapping classes. Similar shaped-based classes appearing in consecutive characters are appended to a cardinality term, a repetition count of the class. The LD measurement groups OCR-Keys and differentiates on both class and cardinality to arrive at a shape-based mismatch error between competing candidate words from an associated dictionary and a target word from the OCR scan. The shape-based LD measurement errors are then functionally merged with the character identity-based deletion, substitution, and insertion errors to find a minimum error for the set of candidate words, corresponding to the preferred candidate word match to the target word.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.