Patent · US Active

Apparatus and method for sharing and pruning weights for vision and language models

US12386873B2 · kind B2 · utility

0Cited by

3References

20Claims

0Family size

Assignee

SAMSUNG ELECTRONICS CO., LTD. · KR

Inventors

Shangqian Gao · 甘垛镇, CN
Burak Uzkent · Mountain View, US
Yilin Shen · Mountain View, US
Hongxia Jin · Cupertino, US

Key dates

Filing date	Sep 14, 2023
Grant date	Aug 12, 2025
Priority date	—
Expiry date	Sep 14, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/0464
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method of performing a multimodal tasks by using a multimodal model that includes a text encoder and a vision encoder, may include obtaining a text feature from the query via the text encoder; obtaining an image feature from the one or more input images via the vision encoder; and outputting a response to the query based on similarity between the text feature and the image feature, wherein weights vectors of the text encoder and the vision encoder are pruned and shared according to a sharing vector and a pruning vector that are generated by a hypernetwork, and wherein the hypernetwork and the multimodal model are jointly trained to minimize at least one of a difference between the weight vectors in the text encoder and the vision encoder, a difference between the weight vectors in different layers of the text encoder, and a number of parameters in the multimodal model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.