Computing and applying order statistics for data preparation
US8868573B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 11, 2012 |
| Grant date | Oct 21, 2014 |
| Priority date | — |
| Expiry date | Aug 2, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06Q10/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.