Patent · US Active

Efficientformer vision transformer

US12236668B2 · kind B2 · utility

0Cited by

1References

15Claims

0Family size

Assignee

SNAP INC. · US

Inventors

Jian Ren · Los Serranos, US
Yang Wen · San Francisco, US
Ju Hu · Shanghai, CN
Georgios Evangelidis · Wien, AT
Sergey Tulyakov · Santa Monica, US
Yanyu Li · Malden, US
Geng Yuan · Nanhu, CN

Key dates

Filing date	Jul 14, 2022
Grant date	Feb 25, 2025
Priority date	—
Expiry date	Jul 11, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06V20/64
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A vision transformer network having extremely low latency and usable on mobile devices, such as smart eyewear devices and other augmented reality (AR) and virtual reality (VR) devices. The transformer network processes an input image, and the network includes a convolution stem configured to patch embed the image. A first stack of stages including at least two stages of 4-Dimension (4D) metablocks (MBs) (MB4D) follow the convolution stem. A second stack of stages including at least two stages of 3-Dimension MBs (MB3D) follow the MB4D stages. Each of the MB4D stages and each of the MB3D stages include different layer configurations, and each of the MB4D stages and each of the MB3D stages include a token mixer. The MB3D stages each additionally include a multi-head self attention (MHSA) processing block.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.