imaging-data-commons
Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Use for accessing large-scale radiology (CT, MR, PET) and pathology datasets for AI training or research. No authentication required. Query by metadata, visualize in browser, check licenses.
Author
Category
AI Skill DevelopmentInstall
Hot:18
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-imaging-data-commons&locale=en&source=copy
Imaging Data Commons: Free cancer imaging data search and download
Overview
Imaging Data Commons (IDC) is a public cancer imaging database maintained by the U.S. National Cancer Institute (NCI). Using the idc-index Python package you can freely query and download radiology data such as CT, MRI, PET, as well as pathology slide data, without any authentication.
Use cases
1. Medical AI model training
Obtain large-scale labeled or unlabeled medical imaging datasets for deep learning projects. Supports filtering by cancer type, imaging modality (CT/MR/PET), anatomical site, etc., and allows downloading training data for tasks such as classification, segmentation, and detection.
2. Medical imaging research
Researchers can obtain standardized DICOM-format imaging data for image processing algorithm development, radiomics studies, multicenter analyses, etc. The data include complete metadata and clinical information (for some collections), supporting linkage with clinical data.
3. Quick data preview and filtering
Before bulk downloading, you can preview image series using browser visualization tools to avoid downloading unsuitable data. Supports flexible filtering via SQL queries, estimating download size, and checking data license types.
Core features
1. SQL metadata queries
Query IDC index tables using standard SQL syntax, supporting filtering by collection_id, patient, study, series, and other dimensions. You can retrieve metadata fields such as imaging modality, anatomical site, device vendor, study date, etc., and perform JOINs across tables to obtain extended information like cancer type and analysis results.
2. Bulk DICOM download
Bulk download DICOM files by specifying collection_id, PatientID, StudyInstanceUID, or SeriesInstanceUID. Supports custom directory structure templates, batch downloading of large datasets, and automatically retrieves files from public buckets on AWS S3 or Google Cloud Storage.
3. Browser visualization and license checking
Generate OHIF or SLIM viewer links to preview images directly in the browser without downloading. Query the license type for each dataset (CC BY 4.0 allows commercial use; CC BY-NC restricts commercial use), and automatically generate citation formats that comply with academic norms.
Frequently Asked Questions
Is Imaging Data Commons data completely free? Can it be used for commercial projects?
Access to IDC data is completely free and requires no registration. However, you must comply with the license terms of each dataset. Approximately 97% of the data use a CC BY license, which allows commercial use (with attribution); about 3% use CC BY-NC, which prohibits commercial use. Before use, be sure to check the license type via the
license_short_name field.How to find imaging data for a specific cancer type (e.g., lung cancer)?
Cancer type information is stored in the
collections_index table. You need to run client.fetch_index("collections_index") first, then perform a JOIN query:SELECT i.* FROM index i JOIN collections_index c ON i.collection_id = c.collection_id WHERE c.CancerTypes LIKE '%Lung%' AND i.Modality = 'CT'.How to process downloaded DICOM files with Python?
You can read them using the pydicom library:
import pydicom; ds = pydicom.dcmread('file.dcm'); image = ds.pixel_array.For CT/MRI sequences, you can sort by ImagePositionPatient and stack into a 3D array, or use SimpleITK's ImageSeriesReader to read the full series directly.
What's the difference between IDC and TCIA?
TCIA (The Cancer Imaging Archive) was the predecessor of IDC. IDC is NCI's upgraded platform built on TCIA, offering more modern APIs (idc-index, BigQuery, DICOMweb), direct access to cloud storage, and improved metadata indexing. TCIA data has been migrated to IDC, and new projects are recommended to use IDC.