Patent · US Active

System and process for record duplication analysis

US8554742B2 · kind B2 · utility

7Cited by
8References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 6, 2009
Grant dateOct 8, 2013
Priority date
Expiry dateJul 2, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N7/01
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and process for record duplication analysis that relies on a multi-membership Bayesian analysis to determine the probability that records within a data set are matches. The Bayesian calculation may rely on objective data describing the data set as well as subjective assessments of the data set. In addition, a system and process for record duplication analysis may rely on the predetermination of probabilistic patterns, where the system only searches for patterns exceeding a chosen threshold. Work flow may include selecting which fields within each record should be analyzed, normalizing the values within those fields and removing default data, calculating possible patterns and their match probabilities, analyzing record pairs to determine which have patterns exceeding a chosen threshold to determine the presence of duplicates, and merging duplicates, closing transactions reflecting non-duplicates, identifying records having insufficient data to determine the existence or lack of a match, and/or rolling back accidental merges.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.