BioServices Python - Unified interface to 40+ bioinformatics databases

BioServices - Unified Bioinformatics Database Interface

Skills Overview

BioServices is a Python package that provides a unified interface to access 40+ bioinformatics databases and web services, including UniProt, KEGG, ChEMBL, Reactome, and more. It supports cross-database queries, ID mapping, and sequence analysis.

Applicable Scenarios

1. Cross-database Protein Analysis

When you need to integrate data from multiple bioinformatics databases—such as querying UniProt for protein sequences, KEGG for pathway information, and PSICQUIC for interaction data—BioServices offers a consistent API so you don't need to learn each database's calling methods separately.

2. Bio-database Identifier Conversion

When mapping identifiers between different databases—such as converting UniProt IDs to KEGG IDs, or converting compound IDs among ChEMBL, ChEBI, and KEGG—BioServices' built-in mapping features can handle batch conversion tasks.

3. Sequence and Compound Data Retrieval

When you need to run BLAST sequence searches, query protein structure information, or search compound information across databases, BioServices wraps services like NCBI BLAST, PDB, and UniChem, simplifying data retrieval workflows.

Core Features

1. Protein and Sequence Analysis

Search proteins via the UniProt service, retrieve FASTA sequences and functional annotations, use NCBI BLAST for sequence similarity searches, and support fetching protein structure information from PDB.

2. Pathway and Gene Function Analysis

Access KEGG and Reactome databases, search metabolic pathways by gene or pathway name, extract protein interaction networks within pathways, and support parsing pathway data in KGML format.

3. Cross-database ID Mapping

Perform bidirectional identifier conversion among UniProt, KEGG, Ensembl, PDB, RefSeq, and other databases, support batch ID conversion, and map compound IDs among ChEMBL, ChEBI, and KEGG via the UniChem service.

Frequently Asked Questions

Which bioinformatics databases does BioServices support?

BioServices supports 40+ bioinformatics services, including: UniProt (proteins), KEGG (pathways), ChEMBL/ChEBI (compounds), PDB (structures), Reactome (pathways), PSICQUIC (interactions), QuickGO (Gene Ontology), BioMart, ArrayExpress, ENA, and more. See the official documentation for the full list.

What’s the difference between BioServices, gget, and BioPython?

BioServices: Suited for queries across multiple databases, provides a unified interface to 40+ services, excels at ID mapping and database integration

gget: Better for quick single-database queries, command-line friendly

BioPython: Focuses on local handling of sequences and file formats, does not directly access online databases

These tools can be used complementarily: retrieve data with BioServices and perform sequence analysis with BioPython.

How to batch-convert protein IDs?

Use the UniProt service's mapping method to perform batch conversions. For example, to convert UniProt IDs to KEGG IDs:

from bioservices import UniProt
u = UniProt()
results = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403,P12345")

Compound ID conversions can be done using the UniChem service. See the project's batch_id_converter.py script for detailed batch conversion examples.

bioservices

Author

Category

Install