histolab
Lightweight WSI tile extraction and preprocessing. Use for basic slide processing tissue detection, tile extraction, stain normalization for H&E images. Best for simple pipelines, dataset preparation, quick tile-based analysis. For advanced spatial proteomics, multiplexed imaging, or deep learning pipelines use pathml.
Author
Category
Image ProcessingInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
Histolab - Whole-slide image processing and tile extraction tool
Capabilities
Histolab is a Python library designed for digital pathology to process whole-slide images (WSI), automatically detect tissue regions, and extract tiles to prepare training data for deep learning models.
Use cases
1. Preparing training data for deep learning
Build training datasets for pathology image AI models. Histolab can batch-extract tiles of standard sizes from WSIs in formats such as SVS, TIFF, NDPI, automatically filter background regions, support selecting high-quality samples by tissue density, and quickly generate balanced training datasets.
2. Pathology slide visualization and exploration
Quickly browse and inspect slide contents. Supports generating thumbnails, previewing tissue region masks, and visualizing tile extraction locations to help researchers assess slide quality, tissue distribution, and potential issues before formal analysis.
3. Basic tissue segmentation and tile extraction
Perform simple pathology image preprocessing tasks, including automatic tissue detection, background filtering, three extraction strategies (random sampling, grid covering, score-based selection), and basic image filtering. Suitable for lightweight pipelines and rapid prototyping.
Key features
1. Multi-format WSI loading and management
Supports mainstream pathology slide formats (SVS, TIFF, NDPI, etc.), built on OpenSlide. Can read slide metadata (magnification, dimensions, pyramid levels), generate thumbnails for quick preview, and extract image data from specified coordinates.
2. Intelligent tissue detection and masks
Automatically identify tissue regions and generate binary masks, filtering glass background and annotations. Provides three mask types: TissueMask to detect all tissue regions, BiggestTissueBoxMask focusing on the largest tissue block, and BinaryMask for custom rules. Supports previewing mask results.
3. Flexible tile extraction strategies
Supports previewing tile locations before extraction, setting tissue coverage thresholds, and generating CSV reports containing metadata.
Frequently Asked Questions
What is Histolab? What scenarios is it suitable for?
Histolab is a lightweight Python library for processing whole-slide images (WSI) in digital pathology. It provides tissue detection, tile extraction, and basic preprocessing functions, and is particularly suitable for:
Important note: Histolab is positioned as a basic tool. If you need to handle spatial proteomics, multiplex fluorescence imaging, or build complex deep learning pipelines, consider using the more advanced PathML toolkit.
Which pathology slide file formats does Histolab support?
Histolab is built on OpenSlide and supports mainstream pathology slide formats, including:
After loading a slide with the Slide class, you can check image dimensions and pyramid levels via slide.dimensions and slide.levels.
What’s the difference between RandomTiler, GridTiler and ScoreTiler?
These are the three tile extraction strategies provided by Histolab, suitable for different use cases:
| Extractor | Use cases | Key parameters | Features |
|---|---|---|---|
| RandomTiler | Exploratory analysis, training data sampling | n_tiles (number of tiles), seed (random seed) | Fast random sampling, reproducible results |
| GridTiler | Full tissue coverage, spatial analysis | pixel_overlap (overlap in pixels) | Systematic grid extraction, supports sliding windows |
| ScoreTiler | High-quality sample selection, quality control | scorer (scoring function) | Selects best tiles based on metrics like nuclei density |
It is recommended to preview tile locations using the locate_tiles() method before performing full extraction.
How to use Histolab to prepare training data for deep learning?
A basic workflow for preparing training data using Histolab:
from histolab.slide import Slide
from histolab.tiler import RandomTiler
# 1. Load slide
slide = Slide("slide.svs", processed_path="output/")
# 2. Configure tiler
tiler = RandomTiler(
tile_size=(512, 512), # tile size
n_tiles=100, # number of tiles to extract
level=0, # pyramid level (0 = highest resolution)
seed=42, # random seed (reproducible)
check_tissue=True, # check tissue coverage
tissue_percent=80.0 # minimum tissue coverage threshold
)
# 3. Preview tile locations
tiler.locate_tiles(slide, n_tiles=20)
# 4. Perform extraction
tiler.extract(slide, report_path="tiles_metadata.csv")To select cell-dense regions, use ScoreTiler together with NucleiScorer.
Which should I choose, Histolab or PathML?
The two tools target different needs; choose based on your scenario:
Choose Histolab if you:
Choose PathML if you:
They can complement each other: Histolab for quick prototyping and basic processing, PathML for production-grade complex pipelines.
Is Histolab open source? Is it free to use?
Yes. Histolab is licensed under Apache-2.0, allowing free use, modification, and distribution, including commercial use. Installation:
uv pip install histolabor use the regular pip:
pip install histolabThe project is maintained by K-Dense Inc. and provides complete reference documentation and example code.