Patent · US Active

Machine learning training dataset optimization

US12299965B2 · kind B2 · utility

0Cited by

1References

18Claims

0Family size

Assignee

DATALOOP LTD. · IL

Inventors

Or Shabtay · Afula, IL
Eran Shlomo · Haifa, IL

Key dates

Filing date	Jun 26, 2024
Grant date	May 13, 2025
Priority date	—
Expiry date	Jun 26, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/088
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method comprising: receiving a dataset comprising a plurality of data instances; extracting a feature vector representation of each of the data instances in the dataset; choosing a first data instance for adding to a subset of the dataset, wherein the first data instance is removed from the dataset; performing an iterative process comprising: (i) identifying one of the data instances in the dataset which represents a maximal information addition to the subset, based, at least in part, on measuring an information difference parameter between the feature vector representation of the identified data instance and the feature vector representations of all of the data instances in the subset, and (ii) adding the identified data instance to the subset and removing the identified data instance from the dataset, until the information difference parameter is lower than a predetermined threshold; and outputting the subset as a representative subset of the dataset.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.