Patent · US Active

Automated transformation of information from images to textual representations, and applications therefor

US12197412B2 · kind B2 · utility

1Cited by
13References
27Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 3, 2024
Grant dateJan 14, 2025
Priority date
Expiry dateJul 3, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/414
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Recent developments in machine learning (commonly coined “artificial intelligence” or “AI”) have vastly expanded applications for this technology, such as myriad “chat” agents adept at understanding natural human language. While state of the art generative models can parse text queries from a user and provide comprehensive, accurate responses (including generating images depicting desired content), current implementations struggle with understanding all information present in images of documents, especially images of business documents. In particular, generative models fail to understand structured and semi-structured information, e.g., as indicated by graphical information such as lines, geometric relationships (e.g., indicated by tables, graphs, figures, etc.), formatting, and other contextual information that human readers easily and implicitly understand. The disclosed inventive concepts transform structured and semi-structured information along with textual content into a textual representation that allows generative models to better understand textual content and non-textual structured information present in document images.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.