Patent · US Active

Apparatus and method for adaptable and efficient lane-wise tensor processing

US10776110B2 · kind B2 · utility

16Cited by
3References
27Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 29, 2018
Grant dateSep 15, 2020
Priority date
Expiry dateNov 12, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F12/0897
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule a plurality of matrix operations responsive to a tensor matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, each lane comprising: first, second, and third tile registers to store blocks of a first matrix (A), second matrix (B), and third matrix (C), respectively; at least one tensor arithmetic logic unit (TALU) to multiply a block of elements of the first matrix with a block of elements of the second matrix to generate a product and to accumulate the product with a block of elements of the third matrix, wherein each lane is to multiply one or more different blocks of the first and second matrix and to accumulate the resulting one or more products with one or more different blocks of the third matrix; and broadcast circuitry to broadcast one or more invariant matrix blocks to different tile registers within a lane and/or different tile registers across different lanes.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.