System and process for record duplication analysis
US8554742B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 6, 2009 |
| Grant date | Oct 8, 2013 |
| Priority date | — |
| Expiry date | Jul 2, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N7/01
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and process for record duplication analysis that relies on a multi-membership Bayesian analysis to determine the probability that records within a data set are matches. The Bayesian calculation may rely on objective data describing the data set as well as subjective assessments of the data set. In addition, a system and process for record duplication analysis may rely on the predetermination of probabilistic patterns, where the system only searches for patterns exceeding a chosen threshold. Work flow may include selecting which fields within each record should be analyzed, normalizing the values within those fields and removing default data, calculating possible patterns and their match probabilities, analyzing record pairs to determine which have patterns exceeding a chosen threshold to determine the presence of duplicates, and merging duplicates, closing transactions reflecting non-duplicates, identifying records having insufficient data to determine the existence or lack of a match, and/or rolling back accidental merges.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.