Patent · US Active

Quality-performance optimized identification of duplicate data

US11573721B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 24, 2021
Grant dateFeb 7, 2023
Priority date
Expiry dateSep 23, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F9/5072
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An approach is provided for providing optimized identification of duplicate data in a networked computing environment. An aggregate feature vector is created that is specific to an attribute of the data (e.g., a field that holds specific informational content). The aggregate feature vector has a set of dimensions that each define a specific comparison function used to test for similarity between data entries in the attribute. Each dimension in the aggregate feature vector is assigned an effectiveness, and a cost is computed for each dimension. Based on these two, a subset of dimensions is selected to form an optimized feature vector. This optimized feature vector can then be used to analyze a dataset to find matching data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.