Patent · US Active

Method for synthetic data generation for query workloads

US9244950B2 · kind B2 · utility

0Cited by
2References
10Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 3, 2013
Grant dateJan 26, 2016
Priority date
Expiry dateDec 20, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F11/36
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Generation of synthetic database data includes annotated query subplans for a multiple table query workload that includes a desired cardinality for nodes (v) in the subplans. The subplans may be merged and represented by a direct acyclic graph (DAG). The maximum entropy joint probability distribution for each attribute (x) for each node (v) is determined as:for each node ν, where wv is a weight of node v, fv is a conjunct of predicates in a subplan rooted at node v, and Z is a normalization factor. This distribution is determined such that the desired cardinality, and selectivities for each node v determined from the desired cardinality, are satisfied. The data for a plurality of tables are generated by sampling the maximum entropy joint probability distribution for a domain of attributes (x) of a plurality of tables. Data may be efficiently generated for multiple table queries and for DAGs.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.