Systems and methods for preparing data for use by machine learning algorithms
US10713597B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Jan 21, 2019 |
| Grant date | Jul 14, 2020 |
| Priority date | — |
| Expiry date | Jan 21, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N5/022
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Historical data used to train machine learning algorithms can have thousands of records with hundreds of fields, and inevitably includes faulty data that affects the accuracy and utility of a primary model machine learning algorithm. To improve dataset integrity it is segregated into a clean dataset having no invalid data values and a faulty dataset having the invalid data values. The clean dataset is used to produce a secondary model machine learning algorithm trained to generate from plural complete data records a replacement value for a single invalid data value in a data record, and a tertiary model machine learning clustering algorithm trained to generate from plural complete data records replacement values for multiple invalid data values. Substituting the replacement data values for invalid data values in the faulty dataset creates augmented training data which is combined with clean data to train a more accurate and useful primary model.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.