Hierarchical document parsing and metadata generation for machine learning applications
US12393637B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Dec 31, 2024 |
| Grant date | Aug 19, 2025 |
| Priority date | — |
| Expiry date | Dec 31, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/258
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A document is analyzed to identify attributes of the document. A set of statistical measures are calculated for each of the formatting attributes. A correlation between these statistical measures and elements of the document are identified. A hierarchical relationship is determined between the elements of the document. Data from the document is split into chunks using the hierarchical relationship. A representation of the document that includes the hierarchical structure is then generated. The representation is stored in a data store as a vector usable by a machine learning model to perform a query on the document according to the hierarchical structure.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.