esm

Comprehensive toolkit for protein language models including ESM3 (generative multimodal protein design across sequence, structure, and function) and ESM C (efficient protein embeddings and representations). Use this skill when working with protein sequences, structures, or function prediction; designing novel proteins; generating protein embeddings; performing inverse folding; or conducting protein engineering tasks. Supports both local model usage and cloud-based Forge API for scalable inference.

Install

Hot:5

Download and extract to your skills directory

Copy command and send to OpenClaw for auto-install:

Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-esm&locale=en&source=copy

ESM: Protein Language Model and Design Tool

Skills Overview


ESM (Evolutionary Scale Modeling) is a toolkit for protein understanding, generation, and design. It provides the ESM3 multimodal generative model and the ESM C embedding models, supporting sequence generation, structure prediction, inverse folding, and other functions.

Applicable Scenarios

1. Protein Design and Engineering


Use when designing new proteins from scratch, optimizing existing protein sequences, or generating protein variants with specific functions. Supports function-conditioned generation, allowing design of protein sequences according to target functional properties. Suitable for enzyme engineering, antibody optimization, fluorescent protein design, and similar scenarios.

2. Protein Structure Prediction and Analysis


Use when predicting a protein's 3D structure from its amino acid sequence or performing inverse folding (designing sequences based on a structure). ESM3's structure track can generate 3D coordinates and PDB-format outputs, suitable for structural biology research and protein stability analysis.

3. Protein Embeddings and Feature Extraction


Use when converting protein sequences into numeric vectors for downstream machine learning tasks. ESM C models can generate high-quality protein representations, suitable for similarity computation, functional classification, clustering analysis, and other tasks.

Core Features

1. Multimodal Protein Generation


ESM3 supports generation across sequence, structure, and function tracks, either independently or jointly. Using a chain-of-thought approach, it can iteratively optimize protein designs: first predict structure, then refine sequence, and finally validate function. Supports local deployment and cloud-based Forge API calls, offering model choices from 1.4B to 98B parameters.

2. Protein Embeddings and Representation Learning


ESM C offers models at 300M, 600M, and 6B scales for generating protein embedding vectors. It supports batch processing of multiple sequences, suitable for feature extraction from large-scale protein datasets. The generated embeddings can be used for protein classification, function prediction, similarity search, and other downstream tasks.

3. Structure Prediction and Inverse Folding


Supports predicting structure from sequence and designing sequences from structure. For inverse folding tasks, you can input a target structure (PDB format), remove sequence information, and let the model generate sequences that fold into that structure. This is useful for protein stability engineering and designing specific structural scaffolds.

Frequently Asked Questions

What is the difference between ESM and AlphaFold?


ESM and AlphaFold are both deep learning tools related to proteins, but they serve different purposes. AlphaFocus primarily focuses on high-accuracy protein structure prediction, while ESM is a more comprehensive protein language model that, in addition to structure prediction, also supports sequence generation, inverse folding, function prediction, and embedding extraction. If you only need to predict the structure of a known sequence, AlphaFold may be more accurate; if you need to do protein design or multiple downstream tasks, ESM is more suitable.

How to choose between ESM3 and ESM C?


ESM3 is a generative model suited for creating new sequences, predicting structures, and designing functions—creative tasks. ESM C is an embedding model suited for converting sequences into vector representations for classification, similarity computation, and other analytical tasks. In short: use ESM3 when you need "generation," and use ESM C when you need "analysis." They can also be used together, for example generating candidate sequences with ESM3 and then using ESM C to extract features for filtering.

How to choose between local deployment and cloud API?


Local deployment is suitable for development testing, data-sensitive scenarios, or cases requiring extensive iterative adjustments. esm3-sm-open-v1 is an open-source model that can run entirely locally, but the model is relatively small (1.4B). The cloud Forge API provides larger-scale models (7B, 98B) with better quality and speed, but requires network access and API quotas. It is recommended to first use a local model to quickly validate ideas, and then switch to the API for large-scale high-quality generation.