Distributed histogram computation framework using data stream sketches and samples
US11455302B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 31, 2020 |
| Grant date | Sep 27, 2022 |
| Priority date | — |
| Expiry date | Dec 18, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/2462
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods for distributed histogram computation in a framework utilizing data stream sketches and samples are performed by systems and devices. Distributions of large data sets are scanned once and processed by a computing pool, without sorting, to generate local sketches and value samples of each distribution. The local sketches and samples are utilized to construct local histograms on which cardinality estimates are obtained for query plan generation of distributed queries against distributions. Local statistics of distributions are also merged and consolidated to construct a global histogram representative of the entire data set. The global histogram is utilized to determine a cardinality estimation for query plan generation of incoming queries against the entire data set. The addition of new data to a data set or distribution involves a scan of the new data from which new statistics are generated and then merged with existing statistics for a new global histogram.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.