lamindb
This skill should be used when working with LaminDB, an open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR. Use when managing biological datasets (scRNA-seq, spatial, flow cytometry, etc.), tracking computational workflows, curating and validating data with biological ontologies, building data lakehouses, or ensuring data lineage and reproducibility in biological research. Covers data management, annotation, ontologies (genes, cell types, diseases, tissues), schema validation, integrations with workflow managers (Nextflow, Snakemake) and MLOps platforms (W&B, MLflow), and deployment strategies.
Author
Category
Development ToolsInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
LaminDB - Biological Data Management and Provenance Framework
Capabilities Overview
LaminDB is an open-source biological data framework that, via a unified Python API, makes data queryable, traceable, reproducible, and FAIR-compliant. It provides a complete data management solution for biological research such as single-cell sequencing, spatial transcriptomics, and flow cytometry.
Use Cases
Manage multi-modal biological datasets like scRNA-seq, spatial transcriptomics, and flow cytometry. Quickly retrieve and filter experimental data through a unified query interface, and use biological ontologies (genes, cell types, tissues, diseases) for standardized annotation.
Automatically track complete lineage from raw data to analysis results. Record execution of Jupyter Notebooks, Python scripts, and pipelines such as Nextflow and Snakemake to ensure reproducibility and transparency of data sources.
Define data schemas for automatic validation, standardize biological terms and experimental metadata, build a FAIR-compliant data lake, support multi-user collaboration and cloud deployment, and integrate seamlessly with MLOps platforms like Weights & Biases, MLflow, and HuggingFace.
Core Features
Supports formats such as DataFrame, AnnData, Zarr, Parquet, and more. Using Artifact versioning and Record experimental metadata management, LaminDB enables layered storage, streaming loading, and unified cross-dataset queries. It can connect to local file systems, AWS S3, Google Cloud Storage, and other storage backends.
Includes built-in public ontologies such as Ensembl genes, UniProt proteins, CL cell types, Uberon tissues, and Mondo diseases. Supports term standardization, synonym mapping, hierarchical queries, and custom ontology construction to ensure consistent and interoperable data annotation.
Automatically capture relationships between code execution and data outputs via
ln.track() and ln.finish(), generate visual lineage graphs, and support tracing data history by source code, input data, creation time, and other dimensions—making it easy to answer the key question, "How was this result produced?"Frequently Asked Questions
What types of biological data is LaminDB suitable for?
LaminDB is designed for biological research and supports single-cell RNA-seq, spatial transcriptomics, flow cytometry, bulk RNA-seq, multimodal data, electronic health records (EHR), and more. It manages these heterogeneous data via a unified interface, supports common biological formats such as AnnData, MuData, SpatialData, TileDB-SOMA, and can be extended to custom data types.
How do I get started with LaminDB?
Installing LaminDB is straightforward—run uv pip install lamindb for a basic install. Add optional feature modules as needed, for example lamindb[gcp,zarr,fcs]. After installation, run lamin login to authenticate, then initialize an instance with lamin init --storage <path> to get started. Use SQLite for development and PostgreSQL is recommended for production.
How does LaminDB integrate with other tools?
LaminDB offers extensive integration options: workflow managers such as Nextflow, Snakemake, and Redun; MLOps platforms like Weights & Biases, MLflow, HuggingFace, and scVI-tools; storage systems including local, AWS S3, Google Cloud Storage, and S3-compatible services. It also supports Git version control, Vitessce visualization, and DuckDB SQL queries, allowing easy integration into existing biological data analysis workflows.