Patent · US Active

Combining compression, partitioning and quantization of DL models for fitment in hardware processors

US12430558B2 · kind B2 · utility

0Cited by

0References

7Claims

0Family size

Assignee

Tata Consultancy Services Limited · IN

Inventors

Swarnava Dey · Sherghati, IN
Arpan Pal · Sherghati, IN
Gitesh Kulkarni · Kanchinakote, IN
Chirabrata Bhaumik · Sherghati, IN
Arijit Ukil · Sherghati, IN
Jayeeta Mondal · Sherghati, IN
Ishan SAHU · Sherghati, IN
Aakash Tyagi · Kanchinakote, IN
Amit Swain · Sherghati, IN
Arijit Mukherjee · Bengaluru, IN

Key dates

Filing date	Sep 14, 2021
Grant date	Sep 30, 2025
Priority date	—
Expiry date	Aug 1, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/088
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Small and compact Deep Learning models are required for embedded AI in several domains. In many industrial use-cases, there are requirements to transform already trained models to ensemble embedded systems or re-train those for a given deployment scenario, with limited data for transfer learning. Moreover, the hardware platforms used in embedded application include FPGAs, AI hardware accelerators, System-on-Chips and on-premises computing elements (Fog/Network Edge). These are interconnected through heterogenous bus/network with different capacities. Method of the present disclosure finds how to automatically partition a given DNN into ensemble devices, considering the effect of accuracy—latency power—tradeoff, due to intermediate compression and effect of quantization due to conversion to AI accelerator SDKs. Method of the present disclosure is an iterative approach to obtain a set of partitions by repeatedly refining the partitions and generating a cascaded model for inference and training on ensemble hardware.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.