scvi-tools
Deep generative models for single-cell omics. Use when you need probabilistic batch correction (scVI), transfer learning, differential expression with uncertainty, or multi-modal integration (TOTALVI, MultiVI). Best for advanced modeling, batch effects, multimodal data. For standard analysis pipelines use scanpy.
Author
Category
AI Skill DevelopmentInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
scvi-tools - Deep Generative Modeling Framework for Single-Cell Omics
Overview of Capabilities
scvi-tools is a Python framework built on PyTorch and PyTorch Lightning, designed specifically for single-cell genomics. It provides deep generative models and variational inference methods for analyzing multiple single-cell data modalities.
Use Cases
1. Batch Correction and Integration of Single-Cell Sequencing Data
When you need to integrate single-cell data from different batches, labs, or platforms, the probabilistic batch-correction methods provided by scvi-tools can effectively remove technical noise while preserving biological variation. This is suitable for combining and jointly analyzing datasets from multiple experiments.
2. Joint Analysis of Multimodal Single-Cell Data
When you have multimodal data such as CITE-seq or multiome, scvi-tools models like totalVI and MultiVI can jointly model proteins and RNA, or paired/unpaired multi-omics datasets, enabling more comprehensive biological discoveries.
3. Advanced Statistical Inference with Uncertainty
If you need to perform differential expression analysis, cell-type annotation, or RNA velocity analysis and want probabilistic measures of uncertainty, scvi-tools’ Bayesian inference–based methods can provide more rigorous statistical conclusions than traditional approaches.
Core Features
1. Deep Generative Models with a Unified API
scvi-tools offers over 20 pretrained models covering single-cell RNA-seq (scVI, scANVI), ATAC-seq (PeakVI, PoissonVI), multimodal integration (totalVI, MultiVI), and spatial transcriptomics (DestVI, Stereoscope). All models follow a consistent API pattern: register data → train model → extract results, and they integrate seamlessly with scanpy.
2. Probabilistic Batch Correction and Data Integration
Using a variational autoencoder (VAE) architecture, scvi-tools learns latent representations of the data and automatically separates technical variation (e.g., batch effects, donor differences) from biological variation. It supports registering categorical and continuous covariates during the setup_anndata stage, allowing flexible modeling of known technical factors.
3. Uncertainty-Aware Differential Expression Analysis
Unlike traditional frequentist DE methods, scvi-tools provides probabilistic differential expression analysis based on composite hypothesis testing, estimating both effect sizes and statistical uncertainty, and supporting minimum effect-size thresholds to yield more reliable biological conclusions.
Frequently Asked Questions
What is the difference between scvi-tools and scanpy? Which should I choose?
scanpy is a standard workflow tool for single-cell analysis, suitable for routine preprocessing, visualization, and basic analyses. scvi-tools focuses on advanced statistical modeling and deep learning methods, ideal for scenarios requiring batch correction, multimodal integration, or uncertainty-quantified analyses. Many users combine them: use scanpy for preprocessing, scvi-tools for advanced modeling, and scanpy again for downstream visualization.
How do I perform batch correction with scVI?
First ensure you use raw count data (not log-normalized), then call scvi.model.SCVI.setup_anndata() to register batch information, create and train the model, and finally use get_latent_representation() to obtain the batch-corrected latent representation. The whole process can be done in a few lines of code, and the model will automatically learn to remove batch effects.
Does scvi-tools support GPUs?
Yes. scvi-tools is built on PyTorch Lightning and can automatically detect and utilize available GPUs to accelerate training. For large datasets (tens to hundreds of thousands of cells), enabling a GPU can significantly reduce training time. You can install GPU support with pip install scvi-tools[cuda].