Patent · US Active

System and method of bridging the gap between object and image-level representations for open-vocabulary detection

US12288372B2 · kind B2 · utility

0Cited by

2References

20Claims

0Family size

Assignee

Mohamed bin Zayed University of Artificial Intelligence · AE

Inventors

Hanoona Abdul Rasheed BANGALATH · Abu Dhabi, AE
Muhammad MAAZ · Abu Dhabi, AE
Muhammad Uzair KHATTAK · Abu Dhabi, AE
Salman Khan · West Babylon, US
Fahad Shahbaz KHAN · Abu Dhabi, AE

Key dates

Filing date	Dec 20, 2022
Grant date	Apr 29, 2025
Priority date	—
Expiry date	Jan 10, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06V2201/07
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

An object detection system and method in which a machine learning engine is configured with a region-based knowledge distillation stage that generates region embeddings from a training image having bounding boxes. A linear layer learns a region-level vision-language mapping for projecting feature embeddings from the training image to a common feature space shared by text embeddings to obtain the region embeddings. An image-level supervision stage generates pseudo-box labels for a classification training image and region embeddings from the training image having bounding boxes and corresponding class labels and the classification training image having an image-level label as input. Pseudo-box labels are determined on the classification training image as an image-level vision-language mapping. A weight transfer function conditions the image-level vision-language mapping on the learned region-level vision-language mapping. A trained object detector outputs a newly captured image annotated with a bounding box for a novel object.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.