Patent · US Active

Generating data from imbalanced training data sets

US9224104B2 · kind B2 · utility

6Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 24, 2013
Grant dateDec 29, 2015
Priority date
Expiry dateJun 18, 2034

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Injecting generated data samples into a minority data class of an imbalanced training data set is provided. In response to receiving an input to balance the imbalanced training data set that includes a majority data class and the minority data class, a set of data samples is generated for the minority data class. A distance is calculated from each data sample in the set of generated data samples to a center of a kernel that includes a set of data samples of the majority data class. Each data sample in the set of generated data samples is stored within a corresponding distance score bucket based on the calculated distance of a data sample. Generated data samples are selected from a number of highest ranking distance score buckets. The generated data samples selected from the number of highest ranking distance score buckets are injected into the minority data class.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.