Patent · US Active

System for automated data engineering for large scale machine learning

US11301438B2 · kind B2 · utility

1Cited by
1References
10Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 2, 2020
Grant dateApr 12, 2022
Priority date
Expiry dateSep 2, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/273
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Accordingly, a data engineering system for machine learning at scale is disclosed. In one embodiment, the data engineering system includes an ingest processing module having a schema update submodule and a feature statistics update submodule, wherein the schema update submodule is configured to discover new features and add them to a schema, and wherein the feature statistics update submodule collects statistics for each feature to be used in an online transformation, a record store to store data from a data source, and a transformation module, to receive a low dimensional data instance from the record store and to receive the schema and feature statistics from the ingest processing module, and to transform the low dimensional data instance into a high dimensional representation. One embodiment provides a method for data engineering for machine learning at scale, the method including calling a built-in feature transformation or defining a new transformation, specifying a data source and compressing and storing the data, providing ingest-time processing by automatically analyzing necessary statistics for features, and then generating a schema for a dataset for subsequent data engine…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.