Quality-performance optimized identification of duplicate data
US11573721B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 24, 2021 |
| Grant date | Feb 7, 2023 |
| Priority date | — |
| Expiry date | Sep 23, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F9/5072
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An approach is provided for providing optimized identification of duplicate data in a networked computing environment. An aggregate feature vector is created that is specific to an attribute of the data (e.g., a field that holds specific informational content). The aggregate feature vector has a set of dimensions that each define a specific comparison function used to test for similarity between data entries in the attribute. Each dimension in the aggregate feature vector is assigned an effectiveness, and a cost is computed for each dimension. Based on these two, a subset of dimensions is selected to form an optimized feature vector. This optimized feature vector can then be used to analyze a dataset to find matching data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.