Patent · US Active

Schema discovery through statistical transduction

US10331633B2 · kind B2 · utility

0Cited by
7References
16Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 4, 2015
Grant dateJun 25, 2019
Priority date
Expiry dateOct 25, 2036

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/30
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method, system, and computer program product derive data schema for application to a data set. One or more processors generate a directed acyclic weighted graph that encodes data types and semantic types used by a data set. One or more processors assign estimated frequencies for each component of the directed acyclic weighted graph, where the estimated frequencies predict a likelihood of a particular data schema element being used by any data set. One or more processors traverse through paths in the directed acyclic weighted graph with a predetermined portion of the data set to determine a data schema that correctly defines data from the data set and identifies any errors in the data set, and then apply the data schema to the data set to generate clean data that is properly formatted.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.