Patent · US Active

Representative document hierarchy generation

US11580763B2 · kind B2 · utility

2Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 18, 2020
Grant dateFeb 14, 2023
Priority date
Expiry dateFeb 19, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V2201/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

In some aspects, a method includes performing optical character recognition (OCR) based on data corresponding to a document to generate text data, detecting one or more bounded regions from the data based on a predetermined boundary rule set, and matching one or more portions of the text data to the one or more bounded regions to generate matched text data. Each bounded region of the one or more bounded regions encloses a corresponding block of text. The method also includes extracting features from the matched text data to generate a plurality of feature vectors and providing the plurality of feature vectors to a trained machine-learning classifier to generate one or more labels associated with the one or more bounded regions. The method further includes outputting metadata indicating a hierarchical layout associated with the document based on the one or more labels and the matched text data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.