pubchem-database

Query PubChem via PUG-REST API/PubChemPy (110M+ compounds). Search by name/CID/SMILES, retrieve properties, similarity/substructure searches, bioactivity, for cheminformatics.

Install

Hot:8

Download and extract to your skills directory

Copy command and send to OpenClaw for auto-install:

Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-pubchem-database&locale=en&source=copy

PubChem Compound Database Query Tool

Skill Overview


The PubChem Database is a cheminformatics tool based on the PUG-REST API and PubChemPy that helps users quickly query the world’s largest free chemical database to obtain compound structures, molecular properties, similarity searches, and bioactivity data.

Applicable Scenarios

1. Drug Discovery and Lead Compound Screening


In the drug discovery process, researchers need to quickly find compounds similar to known drug structures or screen molecules containing specific pharmacophores. This tool supports similarity and substructure searches using SMILES, helping researchers find potential candidate drugs from over 110 million compounds, and perform drug-likeness assessments using Lipinski’s rules.

2. Bulk Retrieval and Analysis of Compound Properties


Chemists and researchers often need to obtain molecular properties (molecular weight, LogP, TPSA, number of hydrogen bonds, etc.) for multiple compounds simultaneously for structure–activity relationship analysis. This tool provides batch query functionality, allowing retrieval of complete property lists for dozens of compounds at once and supports export as a DataFrame for further statistical analysis.

3. Chemical Identifier Conversion and Structure Visualization


Researchers frequently need to convert between different chemical identifier formats (e.g., from compound name to SMILES, from CID to InChI), or obtain 2D structure images for papers and reports. This tool supports mutual conversion among various identifier types and can download structure files in PNG/SDF/JSON formats.

Core Features

1. Multi-mode Compound Search


Supports searching compounds by chemical name, CID (Compound ID), SMILES, InChI, or molecular formula. Simply enter any identifier to retrieve the compound’s full information, including IUPAC name, molecular formula, molecular weight, canonical SMILES, InChI, and other standard identifiers.

2. Retrieval of Molecular Properties and Bioactivity Data


Can obtain 30+ molecular properties, including basic properties (molecular weight, molecular formula), physicochemical properties (XLogP, TPSA, number of hydrogen bond donors/acceptors), as well as over 270 million bioactivity assay records. Supports filtering by activity result to quickly understand a compound’s bioactivity profile.

3. Structure Similarity and Substructure Search


Performs similarity searches based on the Tanimoto coefficient to find structurally related compounds, or substructure searches to locate molecules containing specific functional groups (such as benzene rings, pyridine, carboxylic acids, sulfonamides). These functions are suitable for virtual screening and pharmacophore exploration.

Frequently Asked Questions

What is the source of the PubChem database data?


PubChem is maintained by the U.S. National Center for Biotechnology Information (NCBI) and integrates chemical substance information from research institutions, pharmaceutical companies, and public databases. It contains over 110 million compounds and 270 million bioactivity records, and is completely free and open to use.

Are there rate limits for the API?


Yes. The PUG-REST API limits requests to a maximum of 5 per second and 400 per minute. It is recommended to add a 0.2–0.3 second delay during bulk queries, or use CIDs instead of name lookups to improve efficiency. Similarity and substructure searches are asynchronous operations and may take 15–30 seconds to complete.

Can multiple compounds be queried in batch?


Yes. Using PubChemPy or the built-in compound_search.py script, you can provide a list of compounds for batch queries. It is recommended to cache commonly used CIDs first, since CID lookups are faster and more stable than name or structure searches. For very large-scale queries, process in batches and add appropriate delays.