Patent · US Active

Continuous builds of derived datasets in response to other dataset updates

US11379525B1 · kind B1 · utility

8Cited by
70References
13Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 25, 2018
Grant dateJul 5, 2022
Priority date
Expiry dateJun 9, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/27
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques for automatically scheduling builds of derived datasets in a distributed database system that supports pipelined data transformations are described herein. In an embodiment, a data processing method comprises obtaining a definition of at least one derived dataset of a data pipeline, and in response to the obtaining: creating and storing a dependency graph in memory, the dependency graph representing the at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; detecting a first update to a first dataset from among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends, and in response to the first update: based on the dependency graph, initiating a first build of a first intermediate derived dataset that depends on the first dataset; initiating a second build that uses the first intermediate derived dataset and that is next in order in the data pipeline according to the dependency graph; asynchronously detecting a second update to a second dataset from among the one or more raw datasets or intermediate derived datasets on which the …

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.