Patent · US Active

Efficient duplicate detection for machine learning data sets

US10963810B2 · kind B2 · utility

7Cited by
33References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 12, 2014
Grant dateMar 30, 2021
Priority date
Expiry dateMay 2, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed. A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set. In response to determining that the duplication metric meets a threshold criterion, one or more responsive actions are initiated, such as the transmission of a notification to a client of the service.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.