Patent · US Active

Generating balanced train-test splits for machine learning

US12327398B2 · kind B2 · utility

0Cited by
11References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 2, 2022
Grant dateJun 10, 2025
Priority date
Expiry dateFeb 13, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V2201/03
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An embodiment for generating balanced train-test splits for machine learning analysis. The embodiment may automatically extract low-level features and high-level features from a series of received datasets. The embodiment may automatically determine a series of impactful features for each of the received datasets correlating to a corresponding label. The embodiment may automatically select subsets of impactful features The embodiment may automatically cluster the received datasets to generate series of clusters, each of the generated series of clusters corresponding to one of the selected subsets of impactful features. The embodiment may automatically generate train-test split versions using datasets from each cluster in each of the generated series of clusters. The embodiment may automatically score each of the generated train-test split versions and select a highest-scoring train-test split version.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.