Patent · US Active

Distributed histogram computation framework using data stream sketches and samples

US11455302B2 · kind B2 · utility

1Cited by
5References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 31, 2020
Grant dateSep 27, 2022
Priority date
Expiry dateDec 18, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/2462
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods for distributed histogram computation in a framework utilizing data stream sketches and samples are performed by systems and devices. Distributions of large data sets are scanned once and processed by a computing pool, without sorting, to generate local sketches and value samples of each distribution. The local sketches and samples are utilized to construct local histograms on which cardinality estimates are obtained for query plan generation of distributed queries against distributions. Local statistics of distributions are also merged and consolidated to construct a global histogram representative of the entire data set. The global histogram is utilized to determine a cardinality estimation for query plan generation of incoming queries against the entire data set. The addition of new data to a data set or distribution involves a scan of the new data from which new statistics are generated and then merged with existing statistics for a new global histogram.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.