Patent · US Active

Structural clustering and alignment of OCR results

US10824899B2 · kind B2 · utility

0Cited by
7References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 27, 2018
Grant dateNov 3, 2020
Priority date
Expiry dateApr 30, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Representative embodiments disclose mechanisms to create a text stream from raw OCR outputs. The raw OCR output comprises a plurality of bounding boxes, each bounding box defining a region containing text which has been recognized by the OCR system. A weight matrix is calculated that comprises a weight for each pair of bounding boxes. The weight representing the probability that a pair of bounding boxes belongs to the same cluster. The bounding boxes are then clustered along the weights. The resulting clusters are first ordered using an ordering criteria. The bounding boxes within each cluster are then ordered according to a second ordering criteria. The ordered clusters and bounding boxes are then arranged into a text stream.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.