Patent · US Active

Machine learning training dataset optimization

US12299965B2 · kind B2 · utility

0Cited by
1References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 26, 2024
Grant dateMay 13, 2025
Priority date
Expiry dateJun 26, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/088
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method comprising: receiving a dataset comprising a plurality of data instances; extracting a feature vector representation of each of the data instances in the dataset; choosing a first data instance for adding to a subset of the dataset, wherein the first data instance is removed from the dataset; performing an iterative process comprising: (i) identifying one of the data instances in the dataset which represents a maximal information addition to the subset, based, at least in part, on measuring an information difference parameter between the feature vector representation of the identified data instance and the feature vector representations of all of the data instances in the subset, and (ii) adding the identified data instance to the subset and removing the identified data instance from the dataset, until the information difference parameter is lower than a predetermined threshold; and outputting the subset as a representative subset of the dataset.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.