stable-baselines3
Production-ready reinforcement learning algorithms (PPO, SAC, DQN, TD3, DDPG, A2C) with scikit-learn-like API. Use for standard RL experiments, quick prototyping, and well-documented algorithm implementations. Best for single-agent RL with Gymnasium environments. For high-performance parallel training, multi-agent systems, or custom vectorized environments, use pufferlib instead.
Author
Category
AI Skill DevelopmentInstall
Hot:26
Download and extract to your skills directory
Copy command and send to OpenClaw for auto-install:
Download and install this skill https://openskills.cc/api/download?slug=k-dense-ai-scientific-skills-stable-baselines3&locale=en&source=copy
Stable Baselines3 - Production-Grade Reinforcement Learning Library
Overview
Stable Baselines3 is a PyTorch-based reinforcement learning library that provides production-grade implementations of algorithms like PPO, SAC, DQN, and TD3. It uses a concise API similar to scikit-learn, suitable for standard single-agent RL experiments and rapid prototyping.
Applicable Scenarios
1. Single-Agent Reinforcement Learning Research
Use Gymnasium-compatible environments for standard single-agent RL experiments. It supports discrete and continuous action spaces and provides well-validated algorithm implementations, making it suitable for paper reproduction and algorithm comparison studies.
2. RL Rapid Prototyping
Quickly build training workflows with a simple, unified API. Built-in environment checks, model saving, evaluation, and callback systems help developers validate ideas and iterate on algorithms quickly.
3. Reinforcement Learning Teaching and Learning
Provides complete training templates, environment creation guides, and algorithm comparison documentation. The codebase is clear and easy to understand, making it an ideal tool for learning algorithm implementations and RL engineering practices.
Core Features
1. Implementations of Many Classic RL Algorithms
Offers reliable implementations of mainstream reinforcement learning algorithms such as PPO, SAC, TD3, DQN, DDPG, A2C, and HER. All algorithms are built on PyTorch and support CPU and GPU training; their behaviors and characteristics have been thoroughly validated.
2. Flexible Environment System
Supports Gymnasium standard environments, custom environments, and vectorized parallel environments. Provides the environment validation tool check_env(), supports automatic normalization of image observations, and is compatible with DummyVecEnv and SubprocVecEnv to accelerate training.
3. Comprehensive Training Monitoring Tools
Includes various callbacks like EvalCallback and CheckpointCallback, supporting training progress visualization, automatic model saving, early stopping strategies, and TensorBoard integration. Custom callback logic can be easily extended.
Frequently Asked Questions
Which reinforcement learning algorithms does Stable Baselines3 support?
Stable Baselines3 supports PPO, SAC, DQN, TD3, DDPG, A2C, and others, as well as HER (Hindsight Experience Replay) for goal-conditioned tasks. PPO and A2C are suitable for general-purpose scenarios and support all action space types; SAC and TD3 focus on continuous control tasks and are sample-efficient; DQN is used for tasks with discrete action spaces. Refer to the algorithm comparison documentation to choose the appropriate algorithm for your task.
How do I create and validate a custom Gymnasium environment?
A custom environment should inherit from gymnasium.Env and implement __init__, reset, step, render, and close methods. Define action_space and observation_space in __init__, and have step return a five-tuple (observation, reward, terminated, truncated, info). Use check_env(env) to validate that the environment API conforms to the specification. Image observations should use np.uint8 format and a channel-first layout; SB3 will automatically divide images by 255 to normalize them.
Does Stable Baselines3 support multi-agent training?
Stable Baselines3 focuses on single-agent reinforcement learning and does not directly support multi-agent systems. For multi-agent training, high-performance parallel environments, or custom vectorized environments, consider using the pufferlib framework. SB3’s vectorized environment functionality is intended to accelerate single-agent training, not multi-agent interactions.