histolab

Lightweight WSI tile extraction and preprocessing. Use for basic slide processing tissue detection, tile extraction, stain normalization for H&E images. Best for simple pipelines, dataset preparation, quick tile-based analysis. For advanced spatial proteomics, multiplexed imaging, or deep learning pipelines use pathml.

Install

Hot:9

Download and extract to your skills directory

Copy command and send to OpenClaw for auto-install:

Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-histolab&locale=en&source=copy

Histolab - Whole-slide image processing and tile extraction tool

Capabilities


Histolab is a Python library designed for digital pathology to process whole-slide images (WSI), automatically detect tissue regions, and extract tiles to prepare training data for deep learning models.

Use cases

1. Preparing training data for deep learning


Build training datasets for pathology image AI models. Histolab can batch-extract tiles of standard sizes from WSIs in formats such as SVS, TIFF, NDPI, automatically filter background regions, support selecting high-quality samples by tissue density, and quickly generate balanced training datasets.

2. Pathology slide visualization and exploration


Quickly browse and inspect slide contents. Supports generating thumbnails, previewing tissue region masks, and visualizing tile extraction locations to help researchers assess slide quality, tissue distribution, and potential issues before formal analysis.

3. Basic tissue segmentation and tile extraction


Perform simple pathology image preprocessing tasks, including automatic tissue detection, background filtering, three extraction strategies (random sampling, grid covering, score-based selection), and basic image filtering. Suitable for lightweight pipelines and rapid prototyping.

Key features

1. Multi-format WSI loading and management


Supports mainstream pathology slide formats (SVS, TIFF, NDPI, etc.), built on OpenSlide. Can read slide metadata (magnification, dimensions, pyramid levels), generate thumbnails for quick preview, and extract image data from specified coordinates.

2. Intelligent tissue detection and masks


Automatically identify tissue regions and generate binary masks, filtering glass background and annotations. Provides three mask types: TissueMask to detect all tissue regions, BiggestTissueBoxMask focusing on the largest tissue block, and BinaryMask for custom rules. Supports previewing mask results.

3. Flexible tile extraction strategies


  • RandomTiler: Randomly sample a specified number of tiles, suitable for exploratory analysis and training data sampling

  • GridTiler: Grid-based full-coverage extraction, suitable for analyses requiring complete tissue coverage

  • ScoreTiler: Select high-quality tiles based on a scoring function, supports NucleiScorer, CellularityScorer, and custom scorers
  • Supports previewing tile locations before extraction, setting tissue coverage thresholds, and generating CSV reports containing metadata.

    Frequently Asked Questions

    What is Histolab? What scenarios is it suitable for?


    Histolab is a lightweight Python library for processing whole-slide images (WSI) in digital pathology. It provides tissue detection, tile extraction, and basic preprocessing functions, and is particularly suitable for:

  • Preparing pathology image training datasets for deep learning models

  • Batch-processing slides and extracting standard-size tiles

  • Rapid prototyping and simple image preprocessing pipelines
  • Important note: Histolab is positioned as a basic tool. If you need to handle spatial proteomics, multiplex fluorescence imaging, or build complex deep learning pipelines, consider using the more advanced PathML toolkit.

    Which pathology slide file formats does Histolab support?


    Histolab is built on OpenSlide and supports mainstream pathology slide formats, including:

  • SVS (Aperio/Leica scanners)

  • TIFF/TIF (general format)

  • NDPI (Hamamatsu NanoZoomer)

  • SCN (Leica SCANSCOPE)

  • MIRAX (3DHistech)

  • VMS/VMU (Olympus)

  • SVSLIDE (Sakura)
  • After loading a slide with the Slide class, you can check image dimensions and pyramid levels via slide.dimensions and slide.levels.

    What’s the difference between RandomTiler, GridTiler and ScoreTiler?


    These are the three tile extraction strategies provided by Histolab, suitable for different use cases:

    ExtractorUse casesKey parametersFeatures
    RandomTilerExploratory analysis, training data samplingn_tiles (number of tiles), seed (random seed)Fast random sampling, reproducible results
    GridTilerFull tissue coverage, spatial analysispixel_overlap (overlap in pixels)Systematic grid extraction, supports sliding windows
    ScoreTilerHigh-quality sample selection, quality controlscorer (scoring function)Selects best tiles based on metrics like nuclei density

    It is recommended to preview tile locations using the locate_tiles() method before performing full extraction.

    How to use Histolab to prepare training data for deep learning?


    A basic workflow for preparing training data using Histolab:

    from histolab.slide import Slide
    from histolab.tiler import RandomTiler
    
    # 1. Load slide
    slide = Slide("slide.svs", processed_path="output/")
    
    # 2. Configure tiler
    tiler = RandomTiler(
        tile_size=(512, 512),  # tile size
        n_tiles=100,           # number of tiles to extract
        level=0,               # pyramid level (0 = highest resolution)
        seed=42,               # random seed (reproducible)
        check_tissue=True,     # check tissue coverage
        tissue_percent=80.0    # minimum tissue coverage threshold
    )
    
    # 3. Preview tile locations
    tiler.locate_tiles(slide, n_tiles=20)
    
    # 4. Perform extraction
    tiler.extract(slide, report_path="tiles_metadata.csv")

    To select cell-dense regions, use ScoreTiler together with NucleiScorer.

    Which should I choose, Histolab or PathML?


    The two tools target different needs; choose based on your scenario:

    Choose Histolab if you:

  • Need to quickly extract tiles for dataset preparation

  • Perform basic tissue detection and filtering tasks

  • Want to build simple, lightweight preprocessing pipelines

  • Only need to handle H&E stained images
  • Choose PathML if you:

  • Need to handle spatial proteomics or multiplex fluorescence imaging

  • Build complex end-to-end deep learning pipelines

  • Require advanced quality control and standardization features

  • Need to process multiple staining types and more complex image analyses
  • They can complement each other: Histolab for quick prototyping and basic processing, PathML for production-grade complex pipelines.

    Is Histolab open source? Is it free to use?


    Yes. Histolab is licensed under Apache-2.0, allowing free use, modification, and distribution, including commercial use. Installation:

    uv pip install histolab

    or use the regular pip:

    pip install histolab

    The project is maintained by K-Dense Inc. and provides complete reference documentation and example code.