anndata
Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.
Author
Category
File ManagementInstall
Hot:4
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-anndata&locale=en&source=copy
AnnData Skill Details
Overview
AnnData is a data structure in Python for handling annotated data matrices, specifically designed for single-cell genomics analyses. It efficiently stores experimental measurements, cell annotations, and gene metadata, and supports multiple formats such as h5ad and Zarr.
Applicable Scenarios
1. Single-cell RNA-seq Data Analysis
When processing scRNA-seq data, AnnData can store the gene expression matrix, cell type annotations, sample origins, and other information, and together with Scanpy can complete the full analysis workflow from quality control to clustering.
2. Large-scale Genomic Data Storage and Processing
When the data volume exceeds memory capacity, using AnnData’s backed mode allows on-demand loading of data without reading the entire file at once, effectively avoiding out-of-memory issues.
3. Integration of Multi-batch Experimental Data
When handling multiple experimental batches or data from different sample sources, AnnData provides flexible data concatenation functions, supporting strategies like inner and outer joins, and automatically tracks data source labels.
Core Features
1. Multi-format Data I/O
Supports the native h5ad format, Zarr cloud storage format, as well as common genomics formats such as CSV, MTX, Loom, and 10X Genomics, enabling easy data import/export and format conversion.
2. Efficient Data Operations
Provides condition-based subsetting, data transposition, sparse matrix conversion, and other functions, supporting both views and copies to optimize memory usage while ensuring data integrity.
3. Integration with the scverse Ecosystem
As the foundational data structure for tools like Scanpy, scvi-tools, and Muon, AnnData seamlessly connects single-cell analysis, probabilistic modeling, and multimodal data workflows, and also supports PyTorch DataLoader for deep learning.
Frequently Asked Questions
What is the difference between AnnData and Scanpy?
AnnData is a data structure responsible for storing and managing annotated data matrices; Scanpy is an analysis toolkit built on AnnData that provides functions for quality control, normalization, dimensionality reduction, clustering, and more. In short, AnnData is the "container" and Scanpy is the "toolset."
How can I avoid running out of memory when working with large datasets?
Open files in AnnData’s backed mode (
backed='r') so data is loaded only when needed; for sparse data, convert to CSR/CSC formats for storage; additionally, process data in chunks so only a small portion is operated on at a time.Can h5ad files be opened in other software?
h5ad is AnnData’s proprietary format built on the HDF5 standard. While the underlying structure can be read with HDF5 tools like h5py, complete parsing is recommended using AnnData or compatible scverse ecosystem tools. For interoperability with other tools, you can export to common formats such as CSV or MTX.