Efficient duplicate detection for machine learning data sets
US10963810B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 12, 2014 |
| Grant date | Mar 30, 2021 |
| Priority date | — |
| Expiry date | May 2, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed. A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set. In response to determining that the duplication metric meets a threshold criterion, one or more responsive actions are initiated, such as the transmission of a notification to a client of the service.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.