Computerized-system and method for generating a reduced size superior labeled training dataset for a high-accuracy machine learning classification model for extreme class imbalance of instances
US11361254B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 24, 2020 |
| Grant date | Jun 14, 2022 |
| Priority date | — |
| Expiry date | Mar 10, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/20
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computerized-system and method for generating a reduced-size superior labeled training-dataset for a high-accuracy machine-learning-classification model for extreme class imbalance by: (a) retrieving minority and majority class instances to mark them as related to an initial dataset; (b) retrieving a sample of majority instances; (c) selecting an instance to operate a clustering classification model on it and the instances marked as related to the initial dataset to yield clusters; (d) operating a learner model to: (i) measure each instance in the yielded clusters according to a differentiability and an indicativeness estimators; (ii) mark measured instances as related to an intermediate training dataset according to the differentiability and the indicativeness estimators; (e) repeating until a preconfigured condition is met; (f) applying a variation estimator on all marked instances as related to an intermediate training dataset to select most distant instances; and (g) marking the instances as related to a superior training-dataset.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.