Patent · US Active

Systems and methods for preparing data for use by machine learning algorithms

US10713597B2 · kind B2 · utility

5Cited by

7References

27Claims

0Family size

Assignee

NeuralStudio SECZ · KY

Inventor

Jack Copper · Bodden Town, KY

Key dates

Filing date	Jan 21, 2019
Grant date	Jul 14, 2020
Priority date	—
Expiry date	Jan 21, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG06N5/022
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Historical data used to train machine learning algorithms can have thousands of records with hundreds of fields, and inevitably includes faulty data that affects the accuracy and utility of a primary model machine learning algorithm. To improve dataset integrity it is segregated into a clean dataset having no invalid data values and a faulty dataset having the invalid data values. The clean dataset is used to produce a secondary model machine learning algorithm trained to generate from plural complete data records a replacement value for a single invalid data value in a data record, and a tertiary model machine learning clustering algorithm trained to generate from plural complete data records replacement values for multiple invalid data values. Substituting the replacement data values for invalid data values in the faulty dataset creates augmented training data which is combined with clean data to train a more accurate and useful primary model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.