Patent · US Active

Pipelined approach to fused kernels for optimization of machine learning workloads on graphical processing units

US9972063B2 · kind B2 · utility

8Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 30, 2015
Grant dateMay 15, 2018
Priority date
Expiry dateMar 28, 2036

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. An optimized fused GPU kernel is employed to exploit temporal locality for inherent data-flow dependencies in the identified computation. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing for the identified computation is performed. GPU kernel launch parameters are estimated following an analytical model that maximizes thread occupancy and minimizes atomic writes to GPU global memory.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.