Efficientformer vision transformer
US12236668B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 14, 2022 |
| Grant date | Feb 25, 2025 |
| Priority date | — |
| Expiry date | Jul 11, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V20/64
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A vision transformer network having extremely low latency and usable on mobile devices, such as smart eyewear devices and other augmented reality (AR) and virtual reality (VR) devices. The transformer network processes an input image, and the network includes a convolution stem configured to patch embed the image. A first stack of stages including at least two stages of 4-Dimension (4D) metablocks (MBs) (MB4D) follow the convolution stem. A second stack of stages including at least two stages of 3-Dimension MBs (MB3D) follow the MB4D stages. Each of the MB4D stages and each of the MB3D stages include different layer configurations, and each of the MB4D stages and each of the MB3D stages include a token mixer. The MB3D stages each additionally include a multi-head self attention (MHSA) processing block.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.