Document structure identification using post-processing error correction
US11321559B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 17, 2019 |
| Grant date | May 3, 2022 |
| Priority date | — |
| Expiry date | Sep 30, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/414
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Techniques are disclosed for identifying document structural elements and correcting errors in the classification and/or location of the identified structural elements. An example method includes determining location and classification for a structural element on a page of the document using a machine learning (ML) model; determining one or more errors in the location and/or classification for the structural element; and correcting each instance of the one or more errors using other content in the document (e.g., content spatially adjacent to the corresponding structural element on the page of the document). The method may further include storing the document and the location and classification (as corrected), and/or generating a structural map of the page of the document based on the location and classification (as corrected). The use of the document content to correct errors greatly enhances the agreement between the identified structural elements and the original document.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.