Patent · US Active

Automatic discovery of relevant data in massive datasets

US9558245B1 · kind B1 · utility

10Cited by

4References

18Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Lei Gao · Beijing, CN
Sier Han · Xi'an, CN
Jing Xu · Shanghai, CN
Ji Hui Yang · Beijing, CN
Zongyao Zhang · Yangshuo, CN

Key dates

Filing date	Dec 7, 2015
Grant date	Jan 31, 2017
Priority date	—
Expiry date	Dec 7, 2035

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/285
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

An approach for discovery of relevant data in massive datasets. Compare datasets including compare key fields, compare data fields and a core dataset including target data field(s) and core field(s) are received. The compare datasets are categorized into direct and indirect related dataset pools based on the target data field(s) correlation strength with matching compare and core fields. The direct related dataset pool and the core dataset are transformed into reduction datasets based on statistical measure of values of target data fields, shared key fields and compare data fields. Target correlations of the reduction datasets are creating based on a reduction compare and target data fields. Statistical relationship strength of core dataset and the direct related dataset pool are created based on a statistical mean of target correlations and a relevancy data store is created.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.