Automated transformation of information from images to textual representations, and applications therefor
US12197412B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 3, 2024 |
| Grant date | Jan 14, 2025 |
| Priority date | — |
| Expiry date | Jul 3, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/414
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Recent developments in machine learning (commonly coined “artificial intelligence” or “AI”) have vastly expanded applications for this technology, such as myriad “chat” agents adept at understanding natural human language. While state of the art generative models can parse text queries from a user and provide comprehensive, accurate responses (including generating images depicting desired content), current implementations struggle with understanding all information present in images of documents, especially images of business documents. In particular, generative models fail to understand structured and semi-structured information, e.g., as indicated by graphical information such as lines, geometric relationships (e.g., indicated by tables, graphs, figures, etc.), formatting, and other contextual information that human readers easily and implicitly understand. The disclosed inventive concepts transform structured and semi-structured information along with textual content into a textual representation that allows generative models to better understand textual content and non-textual structured information present in document images.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.