Pipelined approach to fused kernels for optimization of machine learning workloads on graphical processing units
US9972063B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 30, 2015 |
| Grant date | May 15, 2018 |
| Priority date | — |
| Expiry date | Mar 28, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. An optimized fused GPU kernel is employed to exploit temporal locality for inherent data-flow dependencies in the identified computation. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing for the identified computation is performed. GPU kernel launch parameters are estimated following an analytical model that maximizes thread occupancy and minimizes atomic writes to GPU global memory.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.