Pipelined approach to fused kernels for optimization of machine learning workloads on graphical processing units
US10223762B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 16, 2018 |
| Grant date | Mar 5, 2019 |
| Priority date | — |
| Expiry date | Mar 16, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for optimization of machine learning (ML) workloads on a graphics processor unit (GPU). The method includes identifying a computation having a generic pattern commonly observed in ML processes. Hierarchical aggregation spanning a memory hierarchy of the GPU for processing is performed for the identified computation including maintaining partial output vector results in shared memory of the GPU. Hierarchical aggregation for vectors is performed including performing intra-block aggregation for multiple thread blocks of a partial output vector results on GPU global memory.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.