Patent · US Active

Quality-performance optimized identification of duplicate data

US11573721B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Soma Shekar Naganna · Bengaluru, IN
Abhishek Seth · Sidhauli, IN
Neeraj Ramkrishna Singh · Bengaluru, IN

Key dates

Filing date	Jun 24, 2021
Grant date	Feb 7, 2023
Priority date	—
Expiry date	Sep 23, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06F9/5072
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

An approach is provided for providing optimized identification of duplicate data in a networked computing environment. An aggregate feature vector is created that is specific to an attribute of the data (e.g., a field that holds specific informational content). The aggregate feature vector has a set of dimensions that each define a specific comparison function used to test for similarity between data entries in the attribute. Each dimension in the aggregate feature vector is assigned an effectiveness, and a cost is computed for each dimension. Based on these two, a subset of dimensions is selected to form an optimized feature vector. This optimized feature vector can then be used to analyze a dataset to find matching data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.