Patent · US Active

Tracking provenance in data science scripts

US11775862B2 · kind B2 · utility

0Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 14, 2020
Grant dateOct 3, 2023
Priority date
Expiry dateJun 28, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/20
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system enables tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.