geo-database

Access NCBI GEO for gene expression/genomics data. Search/download microarray and RNA-seq datasets (GSE, GSM, GPL), retrieve SOFT/Matrix files, for transcriptomics and expression analysis.

Category

Other Tools

Install

Hot:11

Download and extract to your skills directory

Copy command and send to OpenClaw for auto-install:

Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-geo-database&locale=en&source=copy

GEO Database Gene Expression Database Access Skill

Skill Overview


GEO Database is an AI skill for accessing the NCBI Gene Expression Omnibus (GEO) that helps users search, download, and analyze gene expression and genomics data from over 260,000 studies.

Applicable Scenarios

1. Transcriptomics Research


When you need to obtain gene expression data under specific diseases, drug treatments, or experimental conditions, this skill can quickly retrieve microarray and RNA-seq datasets from GEO, obtaining GSE series data, GSM sample data, and GPL platform annotation information.

2. Differential Expression Analysis


For studies needing differential gene expression analysis, this skill can help download Series Matrix expression matrix files, parse data using the GEOparse library, and perform statistical analysis, clustering visualization, and meta-analysis using the Python data science stack (pandas, scipy, statsmodels).

3. Bioinformatics Data Integration


When your research involves integrative analysis of multiple independent datasets, this skill supports bulk downloading of multiple GEO series, harmonizing different platform data formats, extracting expression profiles of key genes, and performing cross-study meta-analysis.

Core Features

1. Intelligent Data Retrieval and Download


Supports accessing GEO data via multiple methods: one-click download of complete GSE series using the GEOparse library, precise search via the NCBI E-utilities API, or direct retrieval of raw SOFT/MINiML format files via FTP. Automatically handles the data hierarchy (Series/Sample/Platform/DataSet) and supports filtering by sample metadata and subset extraction.

2. Data Parsing and Preprocessing


Automatically parses SOFT format and Series Matrix files, extracting expression matrices and sample metadata. Supports preprocessing steps such as log2 transformation, missing value handling, and quality control visualization, and can convert GEO data directly into pandas DataFrame for downstream analysis.

3. Analysis Workflow Support


Provides common analysis templates including differential expression analysis, sample correlation heatmaps, and hierarchical clustering. Supports batch processing of multiple datasets, cross-study gene expression meta-analysis, and integration with the GEO2R online tool for rapid exploratory analysis.

Frequently Asked Questions

What is the GEO database?


GEO (Gene Expression Omnibus) is a public database maintained by the National Center for Biotechnology Information (NCBI) that archives high-throughput gene expression and functional genomics data. As of 2024, GEO contains over 264,000 studies and 8 million samples, covering microarray chips and RNA sequencing data, and is one of the most widely used gene expression data resources in biomedical research.

How do I download data from GEO?


There are several ways to download GEO data. The recommended method is to use the Python GEOparse library; a single line of code GEOparse.get_GEO("GSE123456") will automatically download and parse the complete dataset. For bulk downloads, you can directly access the NCBI FTP site to obtain Series Matrix or SOFT format files. If you only need metadata, you can use the NCBI E-utilities API to query without downloading full files.

What is the difference between GSE, GSM, and GPL?


GEO organizes data in a hierarchical structure: GSE (Series) represents a complete study, including experimental design, related samples, and overall information; GSM (Sample) is a single experimental sample or biological replicate, containing individual sample data and protocol information; GPL (Platform) describes the microarray chip or sequencing platform used, including probe/feature annotations. Simply put: GSE is the project, GSM is the specific sample, and GPL is the detection platform.