Patent · US Active

Systems and methods for field extraction from unlabeled data

US12086698B2 · kind B2 · utility

0Cited by
1References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 24, 2021
Grant dateSep 10, 2024
Priority date
Expiry dateJul 18, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/413
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A field extraction system that does not require field-level annotations for training is provided. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.