Patent · US Active

Machine learning training dataset optimization

US12046021B2 · kind B2 · utility

0Cited by

0References

14Claims

0Family size

Assignee

DATALOOP LTD. · IL

Inventors

Or Shabtay · Afula, IL
Eran Shlomo · Haifa, IL

Key dates

Filing date	May 3, 2021
Grant date	Jul 23, 2024
Priority date	—
Expiry date	Jan 18, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/088
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method comprising: receiving a dataset comprising a plurality of data instances; extracting a feature vector representation of each of the data instances in the dataset; choosing a first data instance for adding to a subset of the dataset, wherein the first data instance is removed from the dataset; performing an iterative process comprising: (i) identifying one of the data instances in the dataset which represents a maximal information addition to the subset, based, at least in part, on measuring an information difference parameter between the feature vector representation of the identified data instance and the feature vector representations of all of the data instances in the subset, and (ii) adding the identified data instance to the subset and removing the identified data instance from the dataset, until the information difference parameter is lower than a predetermined threshold; and outputting the subset as a representative subset of the dataset.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.