System for automated data engineering for large scale machine learning
US11119992B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 23, 2018 |
| Grant date | Sep 14, 2021 |
| Priority date | — |
| Expiry date | Aug 20, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/273
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Accordingly, a data engineering system for machine learning at scale is disclosed. In one embodiment, the data engineering system includes an ingest processing module having a schema update submodule and a feature statistics update submodule, wherein the schema update submodule is configured to discover new features and add them to a schema, and wherein the feature statistics update submodule collects statistics for each feature to be used in an online transformation, a record store to store data from a data source, and a transformation module, to receive a low dimensional data instance from the record store and to receive the schema and feature statistics from the ingest processing module, and to transform the low dimensional data instance into a high dimensional representation. One embodiment provides a method for data engineering for machine learning at scale, the method including calling a built-in feature transformation or defining a new transformation, specifying a data source and compressing and storing the data, providing ingest-time processing by automatically analyzing necessary statistics for features, and then generating a schema for a dataset for subsequent data engine…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.