Patent · US Active

System and method for efficiently amalgamated cnn-transformer architecture for mobile vision applications

US12373672B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

Mohamed bin Zayed University of Artificial Intelligence · AE

Inventors

Muhammad MAAZ · Abu Dhabi, AE
Abdelrahman SHAKER · Abu Dhabi, AE
Hisham Cholakkal · Abu Dhabi, AE
Salman Khan · West Babylon, US
Syed Waqas Zamir · Abu Dhabi, AE
Rao Muhammad Anwer · Abu Dhabi, AE
Fahad Shahbaz KHAN · Abu Dhabi, AE

Key dates

Filing date	Dec 9, 2022
Grant date	Jul 29, 2025
Priority date	—
Expiry date	Apr 5, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06V2201/07
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

An edge computing system, computer readable storage medium and method for object detection, including processing circuitry. The processing circuitry is configured with a hybrid CNN and vision transformer backbone network in an object detection deep learning network. The backbone network receives an image, and includes a first convolutional encoder to extract local features from feature maps of the image, a second stage having consecutive second convolutional encoders, a positional encoding layer, split depth-wise transpose attention (SDTA) encoders, consecutive convolutional encoders, a third stage and a fourth stage SDTA encoder. Each of the SDTA encoders perform multi-headed self-attention by applying a dot product operation across channel dimensions in order to compute cross-covariance across channels to generate attention feature maps. The object detection neural network includes a convolutional network that produces a fixed-size collection of bounding boxes and scores for a presence of object class instances in those boxes.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.