Patent · US Expired

Sampling for aggregation queries

US6842753B2 · kind B2 · utility

31Cited by

8References

26Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

Surajit Chaudhuri · Redmond, US
Vivek Narasayya · Redmond, US
Rajeev Motwani · Palo Alto, US
Mayur Datar · Stanford, US

Key dates

Filing date	Jan 12, 2001
Grant date	Jan 11, 2005
Priority date	—
Expiry date	Nov 22, 2021

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99943
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.