Changelog¶
Release 1.0 (2021-03-15)¶
First Major Version
Breaking Changes:¶
Removed
stable_baselines3.common.cmd_util(already deprecated), please useenv_utilinstead
New Features:¶
Added support for
custom_objectswhen loading models
Bug Fixes:¶
Fixed a bug with
DQNpredict method when usingdeterministic=Falsewith image space
Documentation:¶
Fixed examples
Added new project using SB3: rl_reach (@PierreExeter)
Added note about slow-down when switching to PyTorch
Add a note on continual learning and resetting environment
Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)
Updated the custom policy section
Pre-Release 0.11.1 (2021-02-27)¶
Bug Fixes:¶
Fixed a bug where
train_freqwas not properly converted when loading a saved model
Pre-Release 0.11.0 (2021-02-27)¶
Breaking Changes:¶
evaluate_policynow returns rewards/episode lengths from aMonitorwrapper if one is present, this allows to return the unnormalized reward in the case of Atari games for instance.Renamed
common.vec_env.is_wrappedtocommon.vec_env.is_vecenv_wrappedto avoid confusion with the newis_wrapped()helperRenamed
_get_data()to_get_constructor_parameters()for policies (this affects independent saving/loading of policies)Removed
n_episodes_rolloutand merged it withtrain_freq, which now accepts a tuple(frequency, unit):replay_bufferincollect_rolloutis no more optional
# SB3 < 0.11.0
# model = SAC("MlpPolicy", env, n_episodes_rollout=1, train_freq=-1)
# SB3 >= 0.11.0:
model = SAC("MlpPolicy", env, train_freq=(1, "episode"))
New Features:¶
Add support for
VecFrameStackto stack on first or last observation dimension, along with automatic check for image spaces.VecFrameStacknow has achannels_orderargument to tell if observations should be stacked on the first or last observation dimension (originally always stacked on last).Added
common.env_util.is_wrappedandcommon.env_util.unwrap_wrapperfunctions for checking/unwrapping an environment for specific wrapper.Added
env_is_wrapped()method forVecEnvto check if its environments are wrapped with given Gym wrappers.Added
monitor_kwargsparameter tomake_vec_envandmake_atari_envWrap the environments automatically with a
Monitorwrapper when possible.EvalCallbacknow logs the success rate when available (is_successmust be present in the info dict)Added new wrappers to log images and matplotlib figures to tensorboard. (@zampanteymedio)
Add support for text records to
Logger. (@lorenz-h)
Bug Fixes:¶
Fixed bug where code added VecTranspose on channel-first image environments (thanks @qxcv)
Fixed
DQNpredict method when using singlegym.Envwithdeterministic=FalseFixed bug that the arguments order of
explained_variance()inppo.pyanda2c.pyis not correct (@thisray)Fixed bug where full
HerReplayBufferleads to an index error. (@megan-klaiber)Fixed bug where replay buffer could not be saved if it was too big (> 4 Gb) for python<3.8 (thanks @hn2)
Added informative
PPOconstruction error in edge-case scenario wheren_steps * n_envs = 1(size of rollout buffer), which otherwise causes downstream breaking errors in training (@decodyng)Fixed discrete observation space support when using multiple envs with A2C/PPO (thanks @ardabbour)
Fixed a bug for TD3 delayed update (the update was off-by-one and not delayed when
train_freq=1)Fixed numpy warning (replaced
np.boolwithbool)Fixed a bug where
VecNormalizewas not normalizing the terminal observationFixed a bug where
VecTransposewas not transposing the terminal observationFixed a bug where the terminal observation stored in the replay buffer was not the right one for off-policy algorithms
Fixed a bug where
action_noisewas not used when usingHER(thanks @ShangqunYu)
Deprecations:¶
Others:¶
Add more issue templates
Add signatures to callable type annotations (@ernestum)
Improve error message in
NatureCNNAdded checks for supported action spaces to improve clarity of error messages for the user
Renamed variables in the
train()method ofSAC,TD3andDQNto match SB3-Contrib.Updated docker base image to Ubuntu 18.04
Set tensorboard min version to 2.2.0 (earlier version are apparently not working with PyTorch)
Added warning for
PPOwhenn_steps * n_envsis not a multiple ofbatch_size(last mini-batch truncated) (@decodyng)Removed some warnings in the tests
Documentation:¶
Updated algorithm table
Minor docstring improvements regarding rollout (@stheid)
Fix migration doc for
A2C(epsilon parameter)Fix
clip_rangedocstringFix duplicated parameter in
EvalCallbackdocstring (thanks @tfederico)Added example of learning rate schedule
Added SUMO-RL as example project (@LucasAlegre)
Fix docstring of classes in atari_wrappers.py which were inside the constructor (@LucasAlegre)
Added SB3-Contrib page
Fix bug in the example code of DQN (@AptX395)
Add example on how to access the tensorboard summary writer directly. (@lorenz-h)
Updated migration guide
Updated custom policy doc (separate policy architecture recommended)
Added a note about OpenCV headless version
Corrected typo on documentation (@mschweizer)
Provide the environment when loading the model in the examples (@lorepieri8)
Pre-Release 0.10.0 (2020-10-28)¶
HER with online and offline sampling, bug fixes for features extraction
Breaking Changes:¶
Warning: Renamed
common.cmd_utiltocommon.env_utilfor clarity (affectsmake_vec_envandmake_atari_envfunctions)
New Features:¶
Allow custom actor/critic network architectures using
net_arch=dict(qf=[400, 300], pi=[64, 64])for off-policy algorithms (SAC, TD3, DDPG)Added Hindsight Experience Replay
HER. (@megan-klaiber)VecNormalizenow supportsgym.spaces.Dictobservation spacesSupport logging videos to Tensorboard (@SwamyDev)
Added
share_features_extractorargument toSACandTD3policies
Bug Fixes:¶
Fix GAE computation for on-policy algorithms (off-by one for the last value) (thanks @Wovchena)
Fixed potential issue when loading a different environment
Fix ignoring the exclude parameter when recording logs using json, csv or log as logging format (@SwamyDev)
Make
make_vec_envsupport theenv_kwargsargument when using an env ID str (@ManifoldFR)Fix model creation initializing CUDA even when device=”cpu” is provided
Fix
check_envnot checking if the env has a Dict actionspace before calling_check_nan(@wmmc88)Update the check for spaces unsupported by Stable Baselines 3 to include checks on the action space (@wmmc88)
Fixed feature extractor bug for target network where the same net was shared instead of being separate. This bug affects
SAC,DDPGandTD3when usingCnnPolicy(or custom feature extractor)Fixed a bug when passing an environment when loading a saved model with a
CnnPolicy, the passed env was not wrapped properly (the bug was introduced when implementingHERso it should not be present in previous versions)
Deprecations:¶
Others:¶
Improved typing coverage
Improved error messages for unsupported spaces
Added
.vscodeto the gitignore
Pre-Release 0.9.0 (2020-10-03)¶
Bug fixes, get/set parameters and improved docs
Breaking Changes:¶
Removed
devicekeyword argument of policies; usepolicy.to(device)instead. (@qxcv)Rename
BaseClass.get_torch_variables->BaseClass._get_torch_save_paramsandBaseClass.excluded_save_params->BaseClass._excluded_save_paramsRenamed saved items
tensorstopytorch_variablesfor claritymake_atari_env,make_vec_envandset_random_seedmust be imported with (and not directly fromstable_baselines3.common):
from stable_baselines3.common.cmd_util import make_atari_env, make_vec_env
from stable_baselines3.common.utils import set_random_seed
New Features:¶
Added
unwrap_vec_wrapper()tocommon.vec_envto extractVecEnvWrapperif neededAdded
StopTrainingOnMaxEpisodesto callback collection (@xicocaio)Added
devicekeyword argument toBaseAlgorithm.load()(@liorcohen5)Callbacks have access to rollout collection locals as in SB2. (@PartiallyTyped)
Added
get_parametersandset_parametersfor accessing/setting parameters of the agentAdded actor/critic loss logging for TD3. (@mloo3)
Bug Fixes:¶
Added
unwrap_vec_wrapper()tocommon.vec_envto extractVecEnvWrapperif neededFixed a bug where the environment was reset twice when using
evaluate_policyFix logging of
clip_fractionin PPO (@diditforlulz273)Fixed a bug where cuda support was wrongly checked when passing the GPU index, e.g.,
device="cuda:0"(@liorcohen5)Fixed a bug when the random seed was not properly set on cuda when passing the GPU index
Deprecations:¶
Others:¶
Improve typing coverage of the
VecEnvFix type annotation of
make_vec_env(@ManifoldFR)Removed
AlreadySteppingErrorandNotSteppingErrorthat were not usedFixed typos in SAC and TD3
Reorganized functions for clarity in
BaseClass(save/load functions close to each other, private functions at top)Clarified docstrings on what is saved and loaded to/from files
Simplified
save_to_zip_filefunction by removing duplicate codeStore library version along with the saved models
DQN loss is now logged
Documentation:¶
Added
StopTrainingOnMaxEpisodesdetails and example (@xicocaio)Updated custom policy section (added custom feature extractor example)
Re-enable
sphinx_autodoc_typehintsUpdated doc style for type hints and remove duplicated type hints
Pre-Release 0.8.0 (2020-08-03)¶
DQN, DDPG, bug fixes and performance matching for Atari games
Breaking Changes:¶
AtariWrapperand other Atari wrappers were updated to match SB2 onessave_replay_buffernow receives as argument the file path instead of the folder path (@tirafesi)Refactored
Criticclass forTD3andSAC, it is now calledContinuousCriticand has an additional parametern_criticsSACandTD3now accept an arbitrary number of critics (e.g.policy_kwargs=dict(n_critics=3)) instead of only 2 previously
New Features:¶
Added
DQNAlgorithm (@Artemis-Skade)Buffer dtype is now set according to action and observation spaces for
ReplayBufferAdded warning when allocation of a buffer may exceed the available memory of the system when
psutilis availableSaving models now automatically creates the necessary folders and raises appropriate warnings (@PartiallyTyped)
Refactored opening paths for saving and loading to use strings, pathlib or io.BufferedIOBase (@PartiallyTyped)
Added
DDPGalgorithm as a special case ofTD3.Introduced
BaseModelabstract parent forBasePolicy, which critics inherit from.
Bug Fixes:¶
Fixed a bug in the
close()method ofSubprocVecEnv, causing wrappers further down in the wrapper stack to not be closed. (@NeoExtended)Fix target for updating q values in SAC: the entropy term was not conditioned by terminals states
Use
cloudpickle.loadinstead ofpickle.loadinCloudpickleWrapper. (@shwang)Fixed a bug with orthogonal initialization when bias=False in custom policy (@rk37)
Fixed approximate entropy calculation in PPO and A2C. (@andyshih12)
Fixed DQN target network sharing feature extractor with the main network.
Fixed storing correct
donesin on-policy algorithm rollout collection. (@andyshih12)Fixed number of filters in final convolutional layer in NatureCNN to match original implementation.
Deprecations:¶
Others:¶
Refactored off-policy algorithm to share the same
.learn()methodSplit the
collect_rollout()method for off-policy algorithmsAdded
_on_step()for off-policy base classOptimized replay buffer size by removing the need of
next_observationsnumpy arrayOptimized polyak updates (1.5-1.95 speedup) through inplace operations (@PartiallyTyped)
Switch to
blackcodestyle and addedmake format,make check-codestyleandcommit-checksIgnored errors from newer pytype version
Added a check when using
gSDERemoved codacy dependency from Dockerfile
Added
common.sb2_compat.RMSpropTFLikeoptimizer, which corresponds closer to the implementation of RMSprop from Tensorflow.
Documentation:¶
Updated notebook links
Fixed a typo in the section of Enjoy a Trained Agent, in RL Baselines3 Zoo README. (@blurLake)
Added Unity reacher to the projects page (@koulakis)
Added PyBullet colab notebook
Fixed typo in PPO example code (@joeljosephjin)
Fixed typo in custom policy doc (@RaphaelWag)
Pre-Release 0.7.0 (2020-06-10)¶
Hotfix for PPO/A2C + gSDE, internal refactoring and bug fixes
Breaking Changes:¶
render()method ofVecEnvsnow only accept one argument:modeCreated new file common/torch_layers.py, similar to SB refactoring
Contains all PyTorch network layer definitions and feature extractors:
MlpExtractor,create_mlp,NatureCNN
Renamed
BaseRLModeltoBaseAlgorithm(along with offpolicy and onpolicy variants)Moved on-policy and off-policy base algorithms to
common/on_policy_algorithm.pyandcommon/off_policy_algorithm.py, respectively.Moved
PPOPolicytoActorCriticPolicyin common/policies.pyMoved
PPO(algorithm class) intoOnPolicyAlgorithm(common/on_policy_algorithm.py), to be shared with A2CMoved following functions from
BaseAlgorithm:_load_from_filetoload_from_zip_file(save_util.py)_save_to_file_ziptosave_to_zip_file(save_util.py)safe_meantosafe_mean(utils.py)check_envtocheck_for_correct_spaces(utils.py. Renamed to avoid confusion with environment checker tools)
Moved static function
_is_vectorized_observationfrom common/policies.py to common/utils.py under nameis_vectorized_observation.Removed
{save,load}_running_averagefunctions ofVecNormalizein favor ofload/save.Removed
use_gaeparameter fromRolloutBuffer.compute_returns_and_advantage.
New Features:¶
Bug Fixes:¶
Fixed
render()method forVecEnvsFixed
seed()method forSubprocVecEnvFixed loading on GPU for testing when using gSDE and
deterministic=FalseFixed
register_policyto allow re-registering same policy for same sub-class (i.e. assign same value to same key).Fixed a bug where the gradient was passed when using
gSDEwithPPO/A2C, this does not affectSAC
Deprecations:¶
Others:¶
Re-enable unsafe
forkstart method in the tests (was causing a deadlock with tensorflow)Added a test for seeding
SubprocVecEnvand renderingFixed reference in NatureCNN (pointed to older version with different network architecture)
Fixed comments saying “CxWxH” instead of “CxHxW” (same style as in torch docs / commonly used)
Added bit further comments on register/getting policies (“MlpPolicy”, “CnnPolicy”).
Renamed
progress(value from 1 in start of training to 0 in end) toprogress_remaining.Added
policies.pyfiles for A2C/PPO, which define MlpPolicy/CnnPolicy (renamed ActorCriticPolicies).Added some missing tests for
VecNormalize,VecCheckNanandPPO.
Documentation:¶
Added a paragraph on “MlpPolicy”/”CnnPolicy” and policy naming scheme under “Developer Guide”
Fixed second-level listing in changelog
Pre-Release 0.6.0 (2020-06-01)¶
Tensorboard support, refactored logger
Breaking Changes:¶
Remove State-Dependent Exploration (SDE) support for
TD3Methods were renamed in the logger:
logkv->record,writekvs->write,writeseq->write_sequence,logkvs->record_dict,dumpkvs->dump,getkvs->get_log_dict,logkv_mean->record_mean,
New Features:¶
Added env checker (Sync with Stable Baselines)
Added
VecCheckNanandVecVideoRecorder(Sync with Stable Baselines)Added determinism tests
Added
cmd_utilandatari_wrappersAdded support for
MultiDiscreteandMultiBinaryobservation spaces (@rolandgvc)Added
MultiCategoricalandBernoullidistributions for PPO/A2C (@rolandgvc)Added support for logging to tensorboard (@rolandgvc)
Added
VectorizedActionNoisefor continuous vectorized environments (@PartiallyTyped)Log evaluation in the
EvalCallbackusing the logger
Bug Fixes:¶
Fixed a bug that prevented model trained on cpu to be loaded on gpu
Fixed version number that had a new line included
Fixed weird seg fault in docker image due to FakeImageEnv by reducing screen size
Fixed
sde_sample_freqthat was not taken into account for SACPass logger module to
BaseCallbackotherwise they cannot write in the one used by the algorithms
Deprecations:¶
Others:¶
Renamed to Stable-Baseline3
Added Dockerfile
Sync
VecEnvswith Stable-BaselinesUpdate requirement:
gym>=0.17Added
.readthedoc.ymlfileAdded
flake8andmake lintcommandAdded Github workflow
Added warning when passing both
train_freqandn_episodes_rolloutto Off-Policy Algorithms
Documentation:¶
Added most documentation (adapted from Stable-Baselines)
Added link to CONTRIBUTING.md in the README (@kinalmehta)
Added gSDE project and update docstrings accordingly
Fix
TD3example code block
Pre-Release 0.5.0 (2020-05-05)¶
CnnPolicy support for image observations, complete saving/loading for policies
Breaking Changes:¶
Previous loading of policy weights is broken and replace by the new saving/loading for policy
New Features:¶
Added
optimizer_classandoptimizer_kwargstopolicy_kwargsin order to easily customizer optimizersComplete independent save/load for policies
Add
CnnPolicyandVecTransposeImageto support images as input
Bug Fixes:¶
Fixed
reset_num_timestepsbehavior, soenv.reset()is not called ifreset_num_timesteps=TrueFixed
squashed_outputthat was not pass to policy constructor forSACandTD3(would result in scaled actions for unscaled action spaces)
Deprecations:¶
Others:¶
Cleanup rollout return
Added
get_deviceutil to manage PyTorch devicesAdded type hints to logger + use f-strings
Documentation:¶
Pre-Release 0.4.0 (2020-02-14)¶
Proper pre-processing, independent save/load for policies
Breaking Changes:¶
Removed CEMRL
Model saved with previous versions cannot be loaded (because of the pre-preprocessing)
New Features:¶
Add support for
Discreteobservation spacesAdd saving/loading for policy weights, so the policy can be used without the model
Bug Fixes:¶
Fix type hint for activation functions
Deprecations:¶
Others:¶
Refactor handling of observation and action spaces
Refactored features extraction to have proper preprocessing
Refactored action distributions
Pre-Release 0.3.0 (2020-02-14)¶
Bug fixes, sync with Stable-Baselines, code cleanup
Breaking Changes:¶
Removed default seed
Bump dependencies (PyTorch and Gym)
predict()now returns a tuple to match Stable-Baselines behavior
New Features:¶
Better logging for
SACandPPO
Bug Fixes:¶
Synced callbacks with Stable-Baselines
Fixed colors in
results_plotterFix entropy computation (now summed over action dim)
Others:¶
SAC with SDE now sample only one matrix
Added
clip_meanparameter to SAC policyBuffers now return
NamedTupleMore typing
Add test for
explnRenamed
learning_ratetolr_scheduleAdd
version.txtAdd more tests for distribution
Documentation:¶
Deactivated
sphinx_autodoc_typehintsextension
Pre-Release 0.2.0 (2020-02-14)¶
Python 3.6+ required, type checking, callbacks, doc build
Breaking Changes:¶
Python 2 support was dropped, Stable Baselines3 now requires Python 3.6 or above
Return type of
evaluation.evaluate_policy()has been changedRefactored the replay buffer to avoid transformation between PyTorch and NumPy
Created OffPolicyRLModel base class
Remove deprecated JSON format for Monitor
New Features:¶
Add
seed()method toVecEnvclassAdd support for Callback (cf https://github.com/hill-a/stable-baselines/pull/644)
Add methods for saving and loading replay buffer
Add
extend()method to the buffersAdd
get_vec_normalize_env()toBaseRLModelto retrieveVecNormalizewrapper when it existsAdd
results_plotterfrom Stable BaselinesImprove
predict()method to handle different type of observations (single, vectorized, …)
Bug Fixes:¶
Fix loading model on CPU that were trained on GPU
Fix
reset_num_timestepsthat was not usedFix entropy computation for squashed Gaussian (approximate it now)
Fix seeding when using multiple environments (different seed per env)
Others:¶
Add type check
Converted all format string to f-strings
Add test for
OrnsteinUhlenbeckActionNoiseAdd type aliases in
common.type_aliases
Documentation:¶
fix documentation build
Pre-Release 0.1.0 (2020-01-20)¶
First Release: base algorithms and state-dependent exploration
New Features:¶
Initial release of A2C, CEM-RL, PPO, SAC and TD3, working only with
Boxinput spaceState-Dependent Exploration (SDE) for A2C, PPO, SAC and TD3
Maintainers¶
Stable-Baselines3 is currently maintained by Antonin Raffin (aka @araffin), Ashley Hill (aka @hill-a), Maximilian Ernestus (aka @ernestum), Adam Gleave (@AdamGleave) and Anssi Kanervisto (aka @Miffyli).
Contributors:¶
In random order…
Thanks to the maintainers of V2: @hill-a @enerijunior @AdamGleave @Miffyli
And all the contributors: @bjmuld @iambenzo @iandanforth @r7vme @brendenpetersen @huvar @abhiskk @JohannesAck @EliasHasle @mrakgr @Bleyddyn @antoine-galataud @junhyeokahn @AdamGleave @keshaviyengar @tperol @XMaster96 @kantneel @Pastafarianist @GerardMaggiolino @PatrickWalter214 @yutingsz @sc420 @Aaahh @billtubbs @Miffyli @dwiel @miguelrass @qxcv @jaberkow @eavelardev @ruifeng96150 @pedrohbtp @srivatsankrishnan @evilsocket @MarvineGothic @jdossgollin @stheid @SyllogismRXS @rusu24edward @jbulow @Antymon @seheevic @justinkterry @edbeeching @flodorner @KuKuXia @NeoExtended @PartiallyTyped @mmcenta @richardwu @kinalmehta @rolandgvc @tkelestemur @mloo3 @tirafesi @blurLake @koulakis @joeljosephjin @shwang @rk37 @andyshih12 @RaphaelWag @xicocaio @diditforlulz273 @liorcohen5 @ManifoldFR @mloo3 @SwamyDev @wmmc88 @megan-klaiber @thisray @tfederico @hn2 @LucasAlegre @AptX395 @zampanteymedio @decodyng @ardabbour @lorenz-h @mschweizer @lorepieri8 @ShangqunYu @PierreExeter