Patent · US Active

Optimized subset processing for de-duplication

US10901996B2 · kind B2 · utility

8Cited by

92References

9Claims

0Family size

Assignee

Salesforce.com, Inc. · US

Inventors

Dai Duong Doan · Alameda, US
Arun Kumar Jagota · Sunnyvale, US
Chenghung Ker · Burlingame, US
Parth Vijay Vaishnav · San Francisco, US
Danil Dvinov · San Francisco, US
Dmytro Kudriavtsev · Belmont, US

Key dates

Filing date	Feb 24, 2016
Grant date	Jan 26, 2021
Priority date	—
Expiry date	Jan 16, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/285
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Some embodiments of the present invention include a method for identifying duplicate records from a group of records in a database system. The method includes generating a cluster of records from a group of records based on one or more keys; splitting the cluster of records into multiple subsets of records with each subset of records having fewer number of records than the cluster of records, wherein the splitting the cluster of records into multiple subsets of records is based on a number of records in the cluster of records exceeding a threshold; causing duplicate sets of records in each of the subsets of records to be identified, wherein a duplicate set of records includes one or more records, and wherein when a duplicate set of records includes two or more records, the two or more records are duplicates of one another; merging all of the duplicate sets of records identified from the multiple subsets of records forming a first group of duplicate sets of records; and forming a representative set of records based on selecting a representative record from each of the duplicate sets in the first group of duplicate sets of records.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.