gget

gget - Rapid Query Tool for Bioinformatics Databases

Overview

gget is a command-line bioinformatics tool and Python package that provides a unified query interface to 20+ genomic databases, supporting gene information retrieval, sequence analysis, protein structure prediction, expression data, and disease association analysis.

Use Cases

1. Quick Gene Information Lookup

When you need to quickly find basic gene information, sequences, protein structures, and other data, gget provides a concise command-line interface and Python API, allowing you to query without visiting multiple database websites.

2. Sequence Analysis and Alignment

For routine tasks like sequence alignment, BLAST searches, and multiple sequence alignment, gget integrates tools such as BLAST, BLAT, MUSCLE, and DIAMOND, enabling direct analysis against standard databases.

3. Gene Expression and Functional Analysis

When querying gene expression data, performing enrichment analysis, or retrieving disease associations, gget integrates resources like ARCHS4, Enrichr, OpenTargets, and cBioPortal for one-stop analysis.

Core Features

1. Unified Multi-Database Queries

Supports over 20 bioinformatics databases including Ensembl, UniProt, NCBI, RCSB PDB, ARCHS4, Enrichr, Bgee, OpenTargets, and more. All modules can be used as command-line tools or directly called in Python, returning data in unified JSON, CSV, or DataFrame formats.

2. Sequence Analysis and Structure Prediction

Built-in BLAST and BLAT sequence alignment tools, support for multiple sequence alignment (MUSCLE) and fast local alignment (DIAMOND); integrates AlphaFold2 for protein structure prediction, can generate PDB files and interactive 3D visualizations; supports PDB database queries and linear motif prediction.

3. Expression Data and Disease Associations

Query ARCHS4 gene correlations and tissue expression data, access single-cell RNA-seq data (CELLxGENE), perform GO, KEGG, and other enrichment analyses, retrieve disease and drug association information (OpenTargets), and analyze cancer genomic data (cBioPortal, COSMIC).

Frequently Asked Questions

Which databases does gget support?

gget currently supports 20+ bioinformatics databases, including: Ensembl (reference genomes, gene information), UniProt and NCBI (protein/gene metadata), RCSB PDB (protein structures), ARCHS4 (gene expression correlations), Enrichr (enrichment analysis), Bgee (homology and expression), OpenTargets (disease and drug associations), cBioPortal (cancer genomics), COSMIC (somatic mutations), UCSC (BLAT alignment), and others.

How is gget different from Biopython?

gget focuses on rapid queries and simple analyses, offering a unified command-line interface suited for interactive exploration and ad hoc queries. Biopython provides more comprehensive functionality for complex data processing and large-scale batch analyses. If you only need quick gene lookups, BLAST searches, or enrichment analyses, gget is simpler and more direct; if you need to customize complex analysis pipelines or handle large datasets, Biopython or bioservices may be more appropriate.

Can gget handle large-scale batch analyses?

gget is primarily designed for quick queries and interactive exploration, not optimized for large-scale batch processing. It is very efficient for single or small numbers of gene queries and sequence alignments. However, for processing thousands of genes or large datasets, we recommend using dedicated batch-processing tools (e.g., Biopython) or writing custom scripts. The gget info module recommends querying no more than 1,000 Ensembl IDs at a time.

Author

Category

Install