Patent · US Expired

Method and system for squashing a large data set

US6539391B1 · kind B1 · utility

25Cited by

7References

32Claims

0Family size

Assignee

AT&T CORP. · US

Inventors

William DuMouchel · Miami, US
Christopher Volinsky · Morristown, US
Theodore Johnson · New York, US
Corinna Cortes · New York, US
Daryl Pregibon · Hoboken, US

Key dates

Filing date	Aug 13, 1999
Grant date	Mar 25, 2003
Priority date	—
Expiry date	Aug 13, 2019

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99943
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Apparatus and method for summarizing an original large data set with a representative data set. The data elements in both the original data set and the representative data set have the same variables, but there are significantly fewer data elements in the representative data set. Each data element in the representative data set has an associated weight, representing the degree of compression. There are three steps for constructing the representative data set. First, the original data elements are partitioned into separate bins. Second, moments of the data elements partitioned in each bin are calculated. Finally, the representative data set is generated by finding data elements and associated weights having substantially the same moments as the original data set.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.