scikit-survival
Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.
Author
Category
Development ToolsInstall
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
scikit-survival: Complete Python Survival Analysis Toolkit
Overview
scikit-survival is a comprehensive toolkit designed for survival analysis in Python, providing a complete workflow from data preprocessing to model evaluation. It supports various algorithms such as Cox models, random survival forests, gradient boosting, and survival SVM.
Use Cases
1. Medicine and Biomedical Research
Handles clinical trial data, patient survival time analysis, and disease prognosis modeling. Supports right-censoring, left-censoring, and interval-censoring, making it suitable for cancer research, cardiovascular studies, and other time-to-event analysis scenarios.
2. High-dimensional Feature Selection and Modeling
When the number of features exceeds the number of samples, use CoxnetSurvivalAnalysis (elastic net regularization) for feature selection and dimensionality reduction. This is suitable for gene expression data analysis, molecular biomarker screening, and similar scenarios.
3. Modeling Complex Nonlinear Relationships
Use random survival forests (RSF) or gradient boosting survival analysis (GBSA) to capture complex nonlinear relationships between features and survival time, suitable for scenarios aiming for the highest predictive performance.
Core Features
1. Diverse Survival Models
Provides Cox proportional hazards models (including regularized versions), random survival forests, gradient boosting survival analysis, survival support vector machines, and other algorithms, covering needs from interpretable modeling to high-performance prediction.
2. Specialized Evaluation Metrics
Built-in concordance index (C-index, supporting Harrell and Uno variants), time-dependent AUC, Brier score, and other survival-specific evaluation metrics ensure scientific and accurate model assessment.
3. Competing Risks and Nonparametric Estimation
Supports the cumulative incidence function (CIF) for competing risks analysis and provides Kaplan-Meier and Nelson-Aalen nonparametric estimators to meet diverse survival analysis needs.
Frequently Asked Questions
What skill level is scikit-survival suitable for?
Suitable for users with basic Python and pandas/numpy experience. If you are already familiar with scikit-learn's API style, scikit-survival has a very gentle learning curve. The package includes comprehensive documentation from introductory to advanced topics, including detailed references for Cox models, ensemble methods, SVMs, and other models.
How to choose the appropriate survival model?
If interpretability is required, choose CoxPHSurvivalAnalysis; if the data are high-dimensional (number of features > number of samples), choose CoxnetSurvivalAnalysis; if you seek the highest predictive performance and have sufficient sample size, choose GradientBoostingSurvivalAnalysis or RandomSurvivalForest; for medium-sized datasets consider FastSurvivalSVM.
How are censored data handled?
scikit-survival represents survival outcomes using structured arrays created via sksurv.util.Surv. Right-censoring is the most common case and is handled automatically by the models. For high censoring rates (>40%), it is recommended to use Uno's C-index rather than Harrell's C-index for evaluation to obtain more robust results.