Separate quantization method of forming combination of 4-bit and 8-bit data of neural network
US11531884B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 27, 2019 |
| Grant date | Dec 20, 2022 |
| Priority date | — |
| Expiry date | Oct 20, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/063
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A separate quantization method of forming a combination of 4-bit and 8-bit data of a neural network is disclosed. When a training data set and a validation data set exist, a calibration manner is used to determine a threshold for activations of each of a plurality of layers of a neural network model, so as to determine how many of the activations to perform 8-bit quantization. In a process of weight quantization, the weights of each layer are allocated to 4-bit weights and 8-bit weights according to a predetermined ratio, so as to make the neural network model have a reduced size and a combination of 4-bit and 8-bit weights.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.