deepchem

Molecular ML with diverse featurizers and pre-built datasets. Use for property prediction (ADMET, toxicity) with traditional ML or GNNs when you want extensive featurization options and MoleculeNet benchmarks. Best for quick experiments with pre-trained models, diverse molecular representations. For graph-first PyTorch workflows use torchdrug; for benchmark datasets use pytdc.

Install

Hot:21

Download and extract to your skills directory

Copy command and send to OpenClaw for auto-install:

Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-deepchem&locale=en&source=copy

DeepChem - Python Library for Molecular Machine Learning and Drug Discovery

Skill Overview


DeepChem is a Python machine learning library designed for chemistry, materials science, and biology. It provides molecular data loading, featurization, graph neural networks, and pretrained models for molecular property prediction and drug discovery.

Applicable Scenarios

1. Molecular Property Prediction


When you need to predict physico-chemical properties or bioactivity of molecules, such as solubility, toxicity, binding affinity, or ADMET properties. DeepChem offers 30+ MoleculeNet benchmark datasets and a variety of model choices, from random forests to graph neural networks.

2. Drug Discovery and Screening


Suitable for lead compound screening, activity prediction, and toxicity assessment in drug development. It supports scaffold-based data splitting (Scaffold Splitter) to avoid leakage of similar molecules between training and test sets and ensure reliable model evaluation.

3. Few-Shot / Low-Data Transfer Learning


When experimental data are limited (< 1000 samples), you can fine-tune pretrained models like ChemBERTa, GROVER, or MolFormer to achieve better predictive performance than training from scratch.

Core Features

Molecular Data Loading and Featurization


Supports multiple chemical data formats (SMILES, SDF, FASTA) and provides 20+ featurization methods: molecular fingerprints (ECFP), descriptors, graph representations, 3D structures, etc. Recommends suitable featurization schemes automatically based on model type.

Graph Neural Network Models


Built-in GCN, GAT, MPNN, AttentiveFP and other graph neural network architectures designed for molecular structures. Combined with MolGraphConvFeaturizer or DMPNNFeaturizer, it enables end-to-end learning of molecular representations.

MoleculeNet Benchmarks


One-click loading of 30+ standard benchmark datasets such as Tox21, BBBP, Delaney, QM9, providing standardized train/validation/test splits and evaluation metrics for easy model comparison and performance benchmarking.

Frequently Asked Questions

Is DeepChem suitable for beginners?


Yes. DeepChem provides preloaded MoleculeNet datasets and a concise API to quickly start molecular machine learning experiments. It is recommended to begin with a simple combination like random forest + molecular fingerprints, then gradually try deep learning models.

What model should be chosen for small datasets?


For datasets with < 1000 samples, it is recommended to use transfer learning (fine-tuning ChemBERTa or GROVER) or traditional random forest/XGBoost with molecular fingerprints. Deep learning models tend to overfit on small datasets.

What is the difference between DeepChem and torchdrug?


DeepChem's strengths are its diverse featurization methods and rich collection of preloaded datasets (MoleculeNet), making it suitable for rapid experiments and traditional ML workflows. torchdrug focuses more on PyTorch graph neural networks and is better suited for research that requires highly customized graph models.