Scanpy Skills - Complete Workflow for Single-Cell RNA-seq Analysis in Python

Scanpy - Python single-cell RNA-seq analysis toolkit

Skill Overview

Scanpy is a scalable Python-based toolkit for single-cell transcriptome sequencing data analysis, providing a complete analysis workflow from quality control to cell-type annotation, helping researchers efficiently process and analyze scRNA-seq data.

Applicable Scenarios

1. Single-cell RNA-seq data analysis

Supports importing and processing single-cell data in various formats such as 10X Genomics, h5ad (AnnData), and CSV, suitable for exploratory data analysis and publication-quality results.

2. Cell clustering and visualization

Visualize data using dimensionality reduction methods like UMAP, t-SNE, and PCA; perform cell clustering with the Leiden algorithm to identify cell subpopulations and discover marker genes.

3. Cell-type annotation and trajectory analysis

Annotate cell types based on known marker genes and support trajectory inference methods such as PAGA and diffusion pseudotime to reveal cell differentiation paths.

Core Features

1. Quality control and preprocessing

Provides a complete QC workflow, including computation of mitochondrial gene percentage, filtering of cells and genes, normalization, highly variable gene selection, and batch correction to ensure reliable data quality.

2. Dimensionality reduction, clustering, and visualization

Supports PCA, UMAP, and t-SNE for non-linear dimensionality reduction; Leiden/Louvain graph clustering; and generates publication-quality cell distribution plots and heatmaps.

3. Differential expression and marker gene identification

Performs differential expression analysis between groups using methods such as the Wilcoxon rank-sum test, identifies cluster-specific marker genes, and supports various visualizations including dot plots, heatmaps, and violin plots.

Frequently Asked Questions

What data formats does Scanpy support?

Scanpy supports a variety of mainstream single-cell data formats, including 10X Genomics MTX and H5 formats, AnnData h5ad format, and generic CSV/TSV table formats. Data can be imported using functions like sc.read_10x_mtx(), sc.read_10x_h5(), sc.read_h5ad(), and sc.read_csv().

What is the difference between Scanpy and Seurat?

Both are mainstream single-cell analysis tools. The main differences lie in programming language and ecosystem. Scanpy uses Python and integrates seamlessly with the scverse ecosystem (squidpy, scvi-tools, cellrank), making it suitable for workflows requiring deep learning or custom analyses; Seurat uses R and has strengths in statistical analysis and visualization. The choice mainly depends on the team's tech stack and specific needs.

How should single-cell QC thresholds be set?

Common QC thresholds include: minimum detected genes per cell (min_genes: 200–500), minimum cells per gene (min_cells: 3–10), and upper limit for mitochondrial gene percentage (pct_counts_mt: 5–20%). Specific thresholds should be adjusted based on data quality, tissue type, and experimental design; it is recommended to inspect QC metric distributions with violin plots before deciding.

scanpy

Author

Category

Install