Patent · US Active

Repairing data through domain knowledge

US10127268B2 · kind B2 · utility

0Cited by
6References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 7, 2016
Grant dateNov 13, 2018
Priority date
Expiry dateMay 17, 2037

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/355
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Correcting data in a dataset. A set of data tokens from a tabular data store are grouped into a plurality of different clusters based on similarity of tokens. A reference cluster is selected from among the plurality of different clusters such that the plurality of clusters includes a reference cluster and one or more other clusters, one or more tokens in the one or more other clusters are transformed. Transforming tokens is performed based on a cost of transforming tokens. The effect on the reference cluster of adding the transformed tokens to the reference cluster is determined. Using this information, a correction for a token in the dataset is identified. The data store is updated to correct the token.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.