Utilizing cross-attention guidance to preserve content in diffusion-based image modifications
US12333636B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 3, 2023 |
| Grant date | Jun 17, 2025 |
| Priority date | — |
| Expiry date | Aug 12, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06T2207/20182
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present disclosure relates to systems, non-transitory computer-readable media, and methods for utilizing machine learning models to generate modified digital images. In particular, in some embodiments, the disclosed systems generate image editing directions between textual identifiers of two visual features utilizing a language prediction machine learning model and a text encoder. In some embodiments, the disclosed systems generated an inversion of a digital image utilizing a regularized inversion model to guide forward diffusion of the digital image. In some embodiments, the disclosed systems utilize cross-attention guidance to preserve structural details of a source digital image when generating a modified digital image with a diffusion neural network.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.