Generating balanced train-test splits for machine learning
US12327398B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 2, 2022 |
| Grant date | Jun 10, 2025 |
| Priority date | — |
| Expiry date | Feb 13, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V2201/03
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An embodiment for generating balanced train-test splits for machine learning analysis. The embodiment may automatically extract low-level features and high-level features from a series of received datasets. The embodiment may automatically determine a series of impactful features for each of the received datasets correlating to a corresponding label. The embodiment may automatically select subsets of impactful features The embodiment may automatically cluster the received datasets to generate series of clusters, each of the generated series of clusters corresponding to one of the selected subsets of impactful features. The embodiment may automatically generate train-test split versions using datasets from each cluster in each of the generated series of clusters. The embodiment may automatically score each of the generated train-test split versions and select a highest-scoring train-test split version.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.