Patent · US Active

Organizing, joining, and performing statistical calculations on massive sets of data

US8935257B1 · kind B1 · utility

96Cited by

0References

28Claims

0Family size

Assignee

LinkedIn Corporation · US

Inventors

Srinivas Vemuri · Santa Clara, US
Maneesh Varshney · Los Angeles, US
Krishna P. Puttaswamy Naga · Metuchen, US
Rui Liu · Sunnyvale, US

Key dates

Filing date	Mar 17, 2014
Grant date	Jan 13, 2015
Priority date	—
Expiry date	Mar 17, 2034

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/278
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A system, method, and apparatus are provided for organizing and joining massive sets of data (e.g., tens or hundreds of millions of event records). A dataset is Blocked by first identifying a partition key, which comprises one or more columns of the data. Each Block will contain all dataset records that have partition key values assigned to that Block. A cost constraint (e.g., a maximum size, a maximum number of records) may also be applied to the Blocks. A Block index is generated to identify all Blocks, their corresponding (sequential) partition key values, and their locations. A second dataset that includes the partition key column(s) and that must be correlated with the first dataset may then be Blocked according to the same ranges of partition key values (but without the cost constraint). Corresponding Blocks of the datasets may then be Joined/Aggregated, and analyzed as necessary.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.