Structural clustering and alignment of OCR results
US10824899B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 27, 2018 |
| Grant date | Nov 3, 2020 |
| Priority date | — |
| Expiry date | Apr 30, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Representative embodiments disclose mechanisms to create a text stream from raw OCR outputs. The raw OCR output comprises a plurality of bounding boxes, each bounding box defining a region containing text which has been recognized by the OCR system. A weight matrix is calculated that comprises a weight for each pair of bounding boxes. The weight representing the probability that a pair of bounding boxes belongs to the same cluster. The bounding boxes are then clustered along the weights. The resulting clusters are first ordered using an ordering criteria. The bounding boxes within each cluster are then ordered according to a second ordering criteria. The ordered clusters and bounding boxes are then arranged into a text stream.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.