Masked autoencoders for computer vision
US12266160B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 27, 2022 |
| Grant date | Apr 1, 2025 |
| Priority date | — |
| Expiry date | Jul 22, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V10/774
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In particular embodiments, a computing system may access a plurality of images for pre-training a first machine-learning model that includes an encoder and a decoder. Using each image, the system may pre-train the model by dividing the image into a set a patches, selecting a first subset of the patches to be visible and a second subset of the patches to be masked during the pre-training, processing, using the encoder, the first subset of patches to generate corresponding first latent representations, processing, using the decoder, the first latent representations corresponding to the first subset of patches and mask tokens corresponding to the second subset of patches to generate reconstructed patches corresponding to the second subset of patches, the reconstructed patches and the first subset of patches being used to generate a reconstructed image, and updating the model based on comparisons between the image and the reconstructed image.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.