Detecting correlation from data
US7647293B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 10, 2004 |
| Grant date | Jan 12, 2010 |
| Priority date | — |
| Expiry date | Sep 19, 2024 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99932
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method of discovering dependencies between relational database column pairs and application of discoveries to query optimization is provided. For each candidate column pair remaining after simultaneously generating column pairs, pruning pairs not satisfying specified heuristic constraints, and eliminating pairs with trivial instances of correlation, a random sample of data values is collected. A candidate column pair is tested for the existence of a soft functional dependency (FD), and if a dependency is not found, statistically tested for correlation using a robust chi-squared statistic. Column pairs for which either a soft FD or a statistical correlation exists are prioritized for recommendation to a query optimizer, based on any of: strength of dependency, degree of correlation, or adjustment factor; statistics for recommended columns pairs are tracked to improve selectivity estimates. Additionally, a dependency graph representing correlations and dependencies as edges and column pairs as nodes is provided.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.