Text extraction using optical character recognition
US11961316B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 10, 2022 |
| Grant date | Apr 16, 2024 |
| Priority date | — |
| Expiry date | Jul 3, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/41
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Provided herein are systems and methods for extracting text from a document. Different optical character recognition (OCR) tools are used to extract different versions of the text in the document. Metrics evaluating the quality of the extracted text are compared to identify and select higher quality extracted text. A selected portion of text is compared to a threshold to ensure minimal quality. The selected portion of text is then saved. Error correction can be applied to the selected portion of text based on errors specific to the OCR tools or the document contents.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.