Patent · US Active

Machine learning training dataset optimization

US12046021B2 · kind B2 · utility

0Cited by
0References
14Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 3, 2021
Grant dateJul 23, 2024
Priority date
Expiry dateJan 18, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/088
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method comprising: receiving a dataset comprising a plurality of data instances; extracting a feature vector representation of each of the data instances in the dataset; choosing a first data instance for adding to a subset of the dataset, wherein the first data instance is removed from the dataset; performing an iterative process comprising: (i) identifying one of the data instances in the dataset which represents a maximal information addition to the subset, based, at least in part, on measuring an information difference parameter between the feature vector representation of the identified data instance and the feature vector representations of all of the data instances in the subset, and (ii) adding the identified data instance to the subset and removing the identified data instance from the dataset, until the information difference parameter is lower than a predetermined threshold; and outputting the subset as a representative subset of the dataset.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.