Embedding-based generative model for protein design
US12412637B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 11, 2021 |
| Grant date | Sep 9, 2025 |
| Priority date | — |
| Expiry date | Jul 11, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG16H70/60
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for designing protein sequences conditioned on a specific target fold. The system is a transformer-based generative framework for modeling a complex sequence-structure relationship. To mitigate the heterogeneity between the sequence domain and the fold domain, a Fold-to-Sequence model jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. The joint sequence-fold representation through novel intra-domain and cross-domain losses with an intra-domain loss forces two semantically similar (where the proteins should have the same fold(s)) samples from the same domain to be close to each other in a latent space, while a cross-domain loss forces two semantically similar samples in different domains to be closer. In an embodiment, the Fold-to-Sequence model performs design tasks that include low resolution structures, structures with a region of missing residues, and NMR structural ensembles.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.