Method for synthetic data generation for query workloads
US9244950B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 3, 2013 |
| Grant date | Jan 26, 2016 |
| Priority date | — |
| Expiry date | Dec 20, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F11/36
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as:for each node ν, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.