Generating synthetic training data including document images with key-value pairs
US12374139B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 28, 2022 |
| Grant date | Jul 29, 2025 |
| Priority date | — |
| Expiry date | Nov 18, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/41
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Automated techniques are for generating a large volume of diverse training data that can be used for training machine learning models to extract KV pairs from document images. Given a single input document image and associated annotation data, a large number of diverse synthetic training datapoints are automatically generated by a synthetic data generation system, each datapoint including a synthetic document image and associated annotation data. The generated synthetic training datapoints can be used to train and improve the performance of ML models for extracting KV pairs from document images. In certain implementations, multiple synthetic datapoints are generated by varying the values associated with a key for a content item within the input document image.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.