Efficient data infrastructure for high dimensional data analysis
US7870114B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 15, 2007 |
| Grant date | Jan 11, 2011 |
| Priority date | — |
| Expiry date | Jul 4, 2028 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/283
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Described is a technology by which high dimensional source data corresponding to rows of records with identifiers, and columns comprising dimensions of data values, are processed into a file model for efficient access. An inverted index corresponding to any dimension is built by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table. The record identifiers are arranged into subgroups based on their mapped value; a count and/or an offset may be maintained for locating each of the subgroups. The raw values for a dimension are maintained within a raw value file. For sparse data, the raw value file may be compressed, e.g., by excluding nulls and associating a record identifier with each non-null. A data manager provides access to data in the data files, such as by offering various functions, using caching for efficiency.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.