Patent · US Active

Hierarchical document parsing and metadata generation for machine learning applications

US12393637B1 · kind B1 · utility

0Cited by
3References
21Claims
0Family size

Assignee

Inventor

Key dates

Filing dateDec 31, 2024
Grant dateAug 19, 2025
Priority date
Expiry dateDec 31, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/258
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A document is analyzed to identify attributes of the document. A set of statistical measures are calculated for each of the formatting attributes. A correlation between these statistical measures and elements of the document are identified. A hierarchical relationship is determined between the elements of the document. Data from the document is split into chunks using the hierarchical relationship. A representation of the document that includes the hierarchical structure is then generated. The representation is stored in a data store as a vector usable by a machine learning model to perform a query on the document according to the hierarchical structure.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.