Patent · US Active

Efficient data infrastructure for high dimensional data analysis

US7870114B2 · kind B2 · utility

7Cited by

65References

20Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

Haidong Zhang · Beijing, CN
Guowei Liu · Wuxi, CN
Yantao Li · Beijing, CN
Bing Sun · Beijing, CN
Jian Wang · Beijing, CN

Key dates

Filing date	Jun 15, 2007
Grant date	Jan 11, 2011
Priority date	—
Expiry date	Jul 4, 2028

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/283
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Described is a technology by which high dimensional source data corresponding to rows of records with identifiers, and columns comprising dimensions of data values, are processed into a file model for efficient access. An inverted index corresponding to any dimension is built by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table. The record identifiers are arranged into subgroups based on their mapped value; a count and/or an offset may be maintained for locating each of the subgroups. The raw values for a dimension are maintained within a raw value file. For sparse data, the raw value file may be compressed, e.g., by excluding nulls and associating a record identifier with each non-null. A data manager provides access to data in the data files, such as by offering various functions, using caching for efficiency.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.