Apparatus and method for adaptable and efficient lane-wise tensor processing
US11379229B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 7, 2020 |
| Grant date | Jul 5, 2022 |
| Priority date | — |
| Expiry date | Aug 7, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F12/0897
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule matrix operations responsive to a matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, wherein a lane comprises an arithmetic logic unit to multiply a block of a first matrix with a block of a second matrix to generate a product and to accumulate the product with a block of a third matrix, and wherein the matrix blocks are to be stored in registers within the lane; and broadcast circuitry to broadcast one or more invariant matrix blocks to at least one of different registers within the lane and different registers across different lanes.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.