Systems and methods for field extraction from unlabeled data
US12086698B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 24, 2021 |
| Grant date | Sep 10, 2024 |
| Priority date | — |
| Expiry date | Jul 18, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/413
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A field extraction system that does not require field-level annotations for training is provided. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.