Inferring a dataset schema from input files
US12210491B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 9, 2024 |
| Grant date | Jan 28, 2025 |
| Priority date | — |
| Expiry date | Feb 9, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/205
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method comprises selecting a sample excerpt from a data input file; in response to the determining that a first row in the sample excerpt does not contain a delimited value and a second row does contain a delimited value, determining that the first row consists of header data; identifying one or more jagged rows based on row delimiters that were erroneously placed; causing displaying text that led to creation of a jagged row; receiving an addition or removal of a specific row delimiter to the text; updating the sample excerpt based on the addition or the removal; analyzing the sample excerpt to determine a row delimiter for the data input file; identifying a plurality of rows that is not included in the header data; identifying a plurality of candidate column delimiters and generating a candidate schema for the data input file.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.