Method for automatic hybrid quantization of deep artificial neural networks
US11763158B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 4, 2020 |
| Grant date | Sep 19, 2023 |
| Priority date | — |
| Expiry date | Jun 1, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/045
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes, for each floating-point layer in a set of floating-point layers: calculating a set of input activations and a set of output activations of the floating-point layer; converting the floating-point layer to a low-bit-width layer; calculating a set of low-bit-width output activations based on the set of input activations; and calculating a per-layer deviation statistic of the low-bit-width layer. The method also includes ordering the set of low-bit-width layers based on the per-layer deviation statistic of each low-bit-width layer. The method additionally includes, while a loss-of-accuracy threshold exceeds the accuracy of the quantized network: converting a floating-point layer represented by the low-bit-width layer to a high-bit-width layer; replacing the low-bit-width layer with the high-bit-width layer in the quantized network; updating the accuracy of the quantized network; and, in response to the accuracy of the quantized network exceeding the loss-of-accuracy threshold, returning the quantized network.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.