TorchDrug

TorchDrug - A PyTorch-native Graph Neural Network Toolkit for Drug Discovery

Overview

TorchDrug is a machine learning toolbox built on PyTorch, designed specifically for drug discovery and molecular science research. It offers end-to-end solutions across 40+ datasets and 20+ model architectures for molecular property prediction, protein modeling, knowledge graph reasoning, molecule generation, and retrosynthetic planning.

Use Cases

1. Drug Molecule Design and Screening

Rapidly predict molecular properties with GNN models, including solubility, toxicity, bioactivity and other ADMET attributes. Supports candidate screening based on real datasets such as blood-brain barrier penetration (BBBP), significantly improving early-stage drug discovery efficiency.

2. Protein Function and Structure Prediction

Use pretrained models (e.g., ESM) and structure-aware models (e.g., GearNet) for enzyme function prediction, subcellular localization, and protein–protein interaction analysis, with seamless integration of AlphaFold predicted structures.

3. Biomedical Knowledge Graph Reasoning

Based on large-scale biomedical knowledge graphs like Hetionet (containing 45,000+ entities), use embedding models such as TransE and RotatE for drug repurposing, disease mechanism discovery, and multi-hop reasoning to uncover potential new therapeutic strategies.

Core Features

1. Molecular Property Prediction

Integrates 20+ molecular datasets including BBBP, HIV, Tox21, QM9, and provides multiple GNN architectures such as GIN, GAT, and SchNet. Supports binary classification and multi-task property prediction, ensuring realistic model evaluation through scaffold split strategies.

2. Retrosynthetic Planning

Using the USPTO-50k chemical reaction dataset, implements synthesis route prediction from target molecules to starting materials. Features a two-stage model of reaction center identification and synthon completion, supporting multi-step synthesis planning and commercial availability checks.

3. Flexible Model Ecosystem

Offers general GNNs (GCN, GAT, GIN, RGCN, MPNN), 3D-aware models SchNet and GearNet, and knowledge graph embedding models TransE and ComplEx — all implemented natively in PyTorch and supporting training acceleration with PyTorch Lightning.

Frequently Asked Questions

Who is TorchDrug suitable for?

TorchDrug is primarily aimed at drug discovery researchers, computational biologists, and cheminformatics scientists. If you need to build custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning, TorchDrug is an ideal choice. If you are more focused on pretrained models and diverse feature extractors, DeepChem may be more appropriate; if your focus is benchmark datasets, consider PyTDc.

How do I get started with TorchDrug?

Install using uv pip install torchdrug, or use torchdrug[full] to install all optional dependencies. Start by loading a dataset (e.g., datasets.BBBP()), choose a suitable model (e.g., GIN for molecular graphs), define the prediction task, and then train the model using a standard PyTorch training loop or PyTorch Lightning.

How does TorchDrug interoperate with other drug discovery libraries?

TorchDrug is designed for good interoperability: it works with RDKit for molecular format conversion (SMILES and TorchDrug molecule objects), supports predicted protein structures from AlphaFold and ESM, and is fully compatible with the PyTorch ecosystem, making it easy to integrate into existing deep learning workflows.

Author

Category

Install