Method and apparatus for quantization and dequantization of neural network input and output data using processing-in-memory
US12387767B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 30, 2023 |
| Grant date | Aug 12, 2025 |
| Priority date | — |
| Expiry date | Feb 21, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG11C11/54
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An apparatus and method for creating less computationally intensive nodes for a neural network. An integrated circuit includes a host processor and multiple memory channels, each with multiple memory array banks. Each of the memory array banks includes components of a processing-in-memory (PIM) accelerator and a scatter and gather circuit used to dynamically perform quantization operations and dequantization operations that offload these operations from the host processor. The host processor executes a data model that represents a neural network. The memory array banks store a single copy of a particular data value in a single precision. Therefore, the memory array banks avoid storing replications of the same data value with different precisions to be used by a neural network node. The memory array banks dynamically perform quantization operations and dequantization operations on one or more of the weight values, input data values, and activation output values of the neural network.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.