Apparatus and method for adaptable and efficient lane-wise tensor processing
US10776110B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 29, 2018 |
| Grant date | Sep 15, 2020 |
| Priority date | — |
| Expiry date | Nov 12, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F12/0897
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule a plurality of matrix operations responsive to a tensor matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, each lane comprising: first, second, and third tile registers to store blocks of a first matrix (A), second matrix (B), and third matrix (C), respectively; at least one tensor arithmetic logic unit (TALU) to multiply a block of elements of the first matrix with a block of elements of the second matrix to generate a product and to accumulate the product with a block of elements of the third matrix, wherein each lane is to multiply one or more different blocks of the first and second matrix and to accumulate the resulting one or more products with one or more different blocks of the third matrix; and broadcast circuitry to broadcast one or more invariant matrix blocks to different tile registers within a lane and/or different tile registers across different lanes.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.