您的位置:首页 > 娱乐 > 八卦 > LLM生成模型在生物蛋白质应用:ESM3

LLM生成模型在生物蛋白质应用:ESM3

2024/12/23 1:21:41 来源:https://blog.csdn.net/weixin_42357472/article/details/139971680  浏览:    关键词:LLM生成模型在生物蛋白质应用:ESM3

参考:
https://github.com/evolutionaryscale/esm

通过GPT模型原理,输入蛋白质序列等模态输出预测的蛋白质序列及结构
在这里插入图片描述

使用

参考:https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/examples/generate.ipynb#scrollTo=Ta7VVnJLy7Wd

安装:

pip install esm

huggingface下载模型这个需要token:
https://huggingface.co/settings/tokens
在这里插入图片描述

from huggingface_hub import login
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig# This will prompt you to get an API key from huggingface hub, make one with
# "Read" or "Write" permission and copy it back here.
login()# This will download the model weights and instantiate the model on your machine.
model: ESM3InferenceClient = ESM3.from_pretrained("esm3_sm_open_v1").to("cuda") # or "cpu"# Generate a completion for a partial Carbonic Anhydrase (2vvb)
prompt = "___________________________________________________DQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPP___________________________________________________________"
protein = ESMProtein(sequence=prompt)
# Generate the sequence, then the structure. This will iteratively unmask the sequence track.
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))
# We can show the predicted structure for the generated sequence.
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./generation.pdb")
# Then we can do a round trip design by inverse folding the sequence and recomputing the structure
protein.sequence = None
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
protein.structure = None
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./round_tripped.pdb")

输入prompt一小段蛋白质序列,模型延迟生成_未知位置的序列
在这里插入图片描述
比如这两端各有3个_需要生成6个token填充_
在这里插入图片描述

再把生成的序列预测结构:

# We can show the predicted structure for the generated sequence.
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./generation.pdb")

在这里插入图片描述
在这里插入图片描述
蛋白质结构pymol展示:

!pip install py3Dmol
import py3Dmol# First we can create a `py3Dmol` view object
view = py3Dmol.view(width=500, height=500)
# py3Dmol requires the atomic coordinates to be in PDB format, so we convert the `ProteinChain` object to a PDB string
pdb_str = protein.to_pdb_string()
# Load the PDB string into the `py3Dmol` view object
view.addModel(pdb_str, "pdb")
# Set the style of the protein chain
view.setStyle({"cartoon": {"color": "spectrum"}})
# Zoom in on the protein chain
view.zoomTo()
# Display the protein chain
view.show()

在这里插入图片描述

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com