rdkit

Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For standard workflows with simpler interface, use datamol (wrapper around RDKit). Use rdkit for advanced control, custom sanitization, specialized algorithms.

Install

Hot:3

Download and extract to your skills directory

Copy command and send to OpenClaw for auto-install:

Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-rdkit&locale=en&source=copy

RDKit Cheminformatics Toolkit - Python Molecule Analysis and Drug Discovery

Capabilities Overview


RDKit is an open-source cheminformatics toolkit that provides a Python API for molecular structure parsing, property calculation, similarity analysis, and chemical reaction handling. It supports multiple molecular formats such as SMILES, SDF, and MOL, and is suitable for drug discovery, computational chemistry, and compound library analysis.

Applicable Scenarios

1. Drug Discovery and Virtual Screening


In the early stages of drug development, RDKit can be used for large-scale screening of compound libraries. Using molecular fingerprints and similarity searches, you can quickly identify candidate molecules similar to lead compounds. Combined with Lipinski's Rule of Five and other drug-likeness analyses, RDKit can filter out compounds that do not meet drug-like criteria, reducing downstream experimental costs.

2. Molecular Property Calculation and Descriptor Analysis


Compute over 200 molecular descriptors, including molecular weight, LogP, TPSA, number of hydrogen bond donors/acceptors, and other key properties. Supports batch processing and can analyze thousands of compounds simultaneously for structure-activity relationship studies, ADMET property prediction, and decision-making in molecular optimization.

3. Chemical Reactions and Structural Transformations


Define and execute chemical reactions using SMARTS patterns, with support for atom mapping and retention of stereochemistry. Useful for reaction product prediction, reaction pathway analysis, and generation of derivative libraries. Combined with substructure search and molecular replacement features, RDKit enables rapid design of new molecular structures.

Core Features

1. Multi-format Molecular I/O and Parsing


Supports reading and writing mainstream chemical file formats such as SMILES, SDF, MOL, and InChI. Provides batch and streaming processing capabilities to efficiently handle large compound libraries. Includes built-in molecule validation and standardization functions that automatically handle aromaticity perception, valence checks, and related issues.

2. Molecular Fingerprints and Similarity Analysis


Offers various fingerprint algorithms: RDKit topological fingerprints, Morgan fingerprints (ECFP), MACCS keys (166-bit), and more. Supports similarity metrics such as Tanimoto, Dice, and Cosine, which can be used for compound clustering, diversity analysis, and virtual screening.

3. 3D Conformer Generation and Visualization


Generates 3D molecular coordinates using the ETKDG algorithm and supports UFF/MMFF force field optimization. Can generate multiple conformers for conformational analysis. Includes high-quality molecular drawing capabilities with substructure highlighting and Jupyter Notebook integration.

Frequently Asked Questions

What is RDKit? Who is it for?


RDKit is an open-source cheminformatics and computational chemistry library, primarily aimed at drug discovery scientists, computational chemists, cheminformatics researchers, and data scientists. If you need to handle molecular structures, compute molecular properties, or perform compound screening, RDKit is one of the most commonly used Python tools.

Which should you choose, RDKit or datamol?


If your needs are standard molecular processing workflows (reading molecules, computing descriptors, drawing), datamol provides a more concise API and is quicker to get started with. If you need fine-grained control of the molecular processing pipeline, custom standardization rules, or specialized algorithms, using RDKit directly is more appropriate. Datamol is essentially a wrapper around RDKit.

What molecular file formats does RDKit support?


RDKit supports mainstream chemical formats including SMILES, SMARTS, SDF (structure-data file), MOL, MOL2, InChI, and PDB. It can read and write single molecules or process batches, and supports gzip-compressed files. For very large files, you can use ForwardSDMolSupplier for streaming to avoid running out of memory.

Can RDKit be used for commercial projects for free?


Yes. RDKit is released under the BSD-3-Clause license, permitting free use in commercial and academic projects without special authorization. This means you can use RDKit in internal projects at a pharmaceutical company, commercial software, or paid services.

How do you assess a compound's drug-likeness?


RDKit can quickly compute the five parameters of Lipinski's Rule of Five: molecular weight ≤ 500, LogP ≤ 5, hydrogen bond donors ≤ 5, and hydrogen bond acceptors ≤ 10. It can also calculate TPSA (topological polar surface area), number of rotatable bonds, and other supplementary metrics. These indicators help preliminarily assess a compound's potential for oral absorption.