Patent · US Active

Detecting duplicate records in databases

US7685090B2 · kind B2 · utility

10Cited by

17References

12Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

Surajit Chaudhuri · Redmond, US
Venkatesh Ganti · Redmond, US
Rohit Ananthakrishna · Kanchinakote, IN

Key dates

Filing date	Jul 14, 2005
Grant date	Mar 23, 2010
Priority date	—
Expiry date	Oct 9, 2026

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99942
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.