Schema discovery through statistical transduction
US10331633B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 4, 2015 |
| Grant date | Jun 25, 2019 |
| Priority date | — |
| Expiry date | Oct 25, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method, system, and computer program product derive data schema for application to a data set. One or more processors generate a directed acyclic weighted graph that encodes data types and semantic types used by a data set. One or more processors assign estimated frequencies for each component of the directed acyclic weighted graph, where the estimated frequencies predict a likelihood of a particular data schema element being used by any data set. One or more processors traverse through paths in the directed acyclic weighted graph with a predetermined portion of the data set to determine a data schema that correctly defines data from the data set and identifies any errors in the data set, and then apply the data schema to the data set to generate clean data that is properly formatted.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.