Patent · US Active

Embedding-based generative model for protein design

US12412637B2 · kind B2 · utility

0Cited by
0References
25Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 11, 2021
Grant dateSep 9, 2025
Priority date
Expiry dateJul 11, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG16H70/60
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and method for designing protein sequences conditioned on a specific target fold. The system is a transformer-based generative framework for modeling a complex sequence-structure relationship. To mitigate the heterogeneity between the sequence domain and the fold domain, a Fold-to-Sequence model jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. The joint sequence-fold representation through novel intra-domain and cross-domain losses with an intra-domain loss forces two semantically similar (where the proteins should have the same fold(s)) samples from the same domain to be close to each other in a latent space, while a cross-domain loss forces two semantically similar samples in different domains to be closer. In an embodiment, the Fold-to-Sequence model performs design tasks that include low resolution structures, structures with a region of missing residues, and NMR structural ensembles.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.