Patent · US Active

Systems and methods for preparing data for use by machine learning algorithms

US10713597B2 · kind B2 · utility

5Cited by
7References
27Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJan 21, 2019
Grant dateJul 14, 2020
Priority date
Expiry dateJan 21, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N5/022
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Historical data used to train machine learning algorithms can have thousands of records with hundreds of fields, and inevitably includes faulty data that affects the accuracy and utility of a primary model machine learning algorithm. To improve dataset integrity it is segregated into a clean dataset having no invalid data values and a faulty dataset having the invalid data values. The clean dataset is used to produce a secondary model machine learning algorithm trained to generate from plural complete data records a replacement value for a single invalid data value in a data record, and a tertiary model machine learning clustering algorithm trained to generate from plural complete data records replacement values for multiple invalid data values. Substituting the replacement data values for invalid data values in the faulty dataset creates augmented training data which is combined with clean data to train a more accurate and useful primary model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.