cellxgene-census
Query the CELLxGENE Census (61M+ cells) programmatically. Use when you need expression data across tissues, diseases, or cell types from the largest curated single-cell atlas. Best for population-scale queries, reference atlas comparisons. For analyzing your own data use scanpy or scvi-tools.
Author
Category
Other ToolsInstall
Hot:18
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-cellxgene-census&locale=en&source=copy
CELLxGENE Census - 61M+ Single-Cell Genomic Data Queries
Capabilities Overview
CZ CELLxGENE Census is a Python tool for programmatically accessing over 61 million single-cell genomic profiles, providing a standardized gene expression query interface across tissues, diseases, and cell types.
Use Cases
When you need to query single-cell expression data spanning multiple tissues, disease types, or cell categories, Census provides a unified API to access over 61 million standardized human and mouse cells.
Suitable for comparing your own single-cell data against large reference atlases, or using Census data as a training set for cell type annotation and machine learning model development.
When you need to analyze thousands of single-cell datasets simultaneously for population-scale analyses across tissues and diseases, Census provides unified metadata and expression matrices so you don't need to download and process raw data separately.
Core Features
Supports filtering queries by many metadata fields such as cell type, tissue, disease, donor ID, etc., returning precise subsets of matching cells and their gene expression data. It also provides the
is_primary_data flag to avoid double counting.Query results can be returned directly in AnnData format, fully compatible with the scanpy ecosystem, supporting standard workflows for dimensionality reduction, clustering, visualization, and differential expression analysis.
For queries that exceed available memory, supports out-of-core iterative processing modes and can provide a PyTorch DataLoader interface for training deep learning models.
Frequently Asked Questions
What is CELLxGENE Census and how do I access it?
CZ CELLxGENE Census is a single-cell genomics database maintained by the Chan Zuckerberg Initiative, integrating thousands of standardized datasets and containing over 61 million cells. Using the
cellxgene-census Python package, you can open the database with open_soma() and query data with get_anndata() or get_obs().How do I download single-cell gene expression data?
Use
cellxgene_census.get_anndata() to query and retrieve expression data by conditions. For example: get_anndata(census, organism="Homo sapiens", obs_value_filter="cell_type == 'T cell' and tissue_general == 'lung'") returns the expression matrix for lung T cells. Note to add the is_primary_data == True filter to avoid duplicate cells.Can cellxgene-census be used with scanpy?
Yes. Census query results can be returned directly as AnnData objects, fully compatible with the scanpy workflow. You can perform normalization, dimensionality reduction, clustering, and visualization as you would with any regular AnnData object.
How many cells are in the Census?
As of the latest version, the Census contains over 61 million single cells from human and mouse, covering thousands of source datasets and supporting multidimensional queries by tissue, disease, cell type, and more.
How do I filter by tissue or cell type?
Use the
obs_value_filter parameter to specify filtering conditions, for example obs_value_filter="tissue_general == 'brain' and cell_type == 'neuron'". Supports and, or logical operators and the in operator for multi-value filtering.Is cellxgene-census free?
Yes, CELLxGENE Census is a completely free and open data resource funded and maintained by the Chan Zuckerberg Initiative, and it can be accessed without registration.
Which species are supported?
Currently it primarily supports single-cell data from human (Homo sapiens) and mouse (Mus musculus).
How do I avoid duplicate cells in query results?
Add the
is_primary_data == True filter to all queries. Cells in the Census may appear in multiple datasets; this flag ensures only unique primary cells are returned.