Continuous builds of derived datasets in response to other dataset updates
US11379525B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 25, 2018 |
| Grant date | Jul 5, 2022 |
| Priority date | — |
| Expiry date | Jun 9, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/27
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Techniques for automatically scheduling builds of derived datasets in a distributed database system that supports pipelined data transformations are described herein. In an embodiment, a data processing method comprises obtaining a definition of at least one derived dataset of a data pipeline, and in response to the obtaining: creating and storing a dependency graph in memory, the dependency graph representing the at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; detecting a first update to a first dataset from among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends, and in response to the first update: based on the dependency graph, initiating a first build of a first intermediate derived dataset that depends on the first dataset; initiating a second build that uses the first intermediate derived dataset and that is next in order in the data pipeline according to the dependency graph; asynchronously detecting a second update to a second dataset from among the one or more raw datasets or intermediate derived datasets on which the …
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.