Patent · US Active

Generating synthetic training data including document images with key-value pairs

US12374139B2 · kind B2 · utility

0Cited by
8References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 28, 2022
Grant dateJul 29, 2025
Priority date
Expiry dateNov 18, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/41
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Automated techniques are for generating a large volume of diverse training data that can be used for training machine learning models to extract KV pairs from document images. Given a single input document image and associated annotation data, a large number of diverse synthetic training datapoints are automatically generated by a synthetic data generation system, each datapoint including a synthetic document image and associated annotation data. The generated synthetic training datapoints can be used to train and improve the performance of ML models for extracting KV pairs from document images. In certain implementations, multiple synthetic datapoints are generated by varying the values associated with a key for a content item within the input document image.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.