Vectorized Environments¶
Vectorized Environments are a method for stacking multiple independent environments into a single environment.
Instead of training an RL agent on 1 environment per step, it allows us to train it on n
environments per step.
Because of this, actions
passed to the environment are now a vector (of dimension n
).
It is the same for observations
, rewards
and end of episode signals (dones
).
In the case of non-array observation spaces such as Dict
or Tuple
, where different sub-spaces
may have different shapes, the sub-observations are vectors (of dimension n
).
Name |
|
|
|
|
Multi Processing |
---|---|---|---|---|---|
DummyVecEnv |
✔️ |
✔️ |
✔️ |
✔️ |
❌️ |
SubprocVecEnv |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
Note
Vectorized environments are required when using wrappers for frame-stacking or normalization.
Note
When using vectorized environments, the environments are automatically reset at the end of each episode.
Thus, the observation returned for the i-th environment when done[i]
is true will in fact be the first observation of the next episode, not the last observation of the episode that has just terminated.
You can access the “real” final observation of the terminated episode—that is, the one that accompanied the done
event provided by the underlying environment—using the terminal_observation
keys in the info dicts returned by the vecenv.
Warning
When using SubprocVecEnv
, users must wrap the code in an if __name__ == "__main__":
if using the forkserver
or spawn
start method (default on Windows).
On Linux, the default start method is fork
which is not thread safe and can create deadlocks.
For more information, see Python’s multiprocessing guidelines.
VecEnv¶
-
class
stable_baselines3.common.vec_env.
VecEnv
(num_envs, observation_space, action_space)[source]¶ An abstract asynchronous, vectorized environment.
- Parameters
num_envs (
int
) – the number of environmentsobservation_space (
Space
) – the observation spaceaction_space (
Space
) – the action space
-
abstract
env_is_wrapped
(wrapper_class, indices=None)[source]¶ Check if environments are wrapped with a given wrapper.
- Parameters
method_name – The name of the environment method to invoke.
indices (
Union
[None
,int
,Iterable
[int
]]) – Indices of envs whose method to callmethod_args – Any positional arguments to provide in the call
method_kwargs – Any keyword arguments to provide in the call
- Return type
List
[bool
]- Returns
True if the env is wrapped, False otherwise, for each env queried.
-
abstract
env_method
(method_name, *method_args, indices=None, **method_kwargs)[source]¶ Call instance methods of vectorized environments.
- Parameters
method_name (
str
) – The name of the environment method to invoke.indices (
Union
[None
,int
,Iterable
[int
]]) – Indices of envs whose method to callmethod_args – Any positional arguments to provide in the call
method_kwargs – Any keyword arguments to provide in the call
- Return type
List
[Any
]- Returns
List of items returned by the environment’s method call
-
abstract
get_attr
(attr_name, indices=None)[source]¶ Return attribute from vectorized environment.
- Parameters
attr_name (
str
) – The name of the attribute whose value to returnindices (
Union
[None
,int
,Iterable
[int
]]) – Indices of envs to get attribute from
- Return type
List
[Any
]- Returns
List of values of ‘attr_name’ in all environments
-
getattr_depth_check
(name, already_found)[source]¶ Check if an attribute reference is being hidden in a recursive call to __getattr__
- Parameters
name (
str
) – name of attribute to check foralready_found (
bool
) – whether this attribute has already been found in a wrapper
- Return type
Optional
[str
]- Returns
name of module whose attribute is being shadowed, if any.
-
render
(mode='human')[source]¶ Gym environment rendering
- Parameters
mode (
str
) – the rendering type- Return type
Optional
[ndarray
]
-
abstract
reset
()[source]¶ Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Return type
Union
[ndarray
,Dict
[str
,ndarray
],Tuple
[ndarray
, …]]- Returns
observation
-
abstract
seed
(seed=None)[source]¶ Sets the random seeds for all environments, based on a given seed. Each individual environment will still get its own seed, by incrementing the given seed.
- Parameters
seed (
Optional
[int
]) – The random seed. May be None for completely random seeding.- Return type
List
[Optional
[int
]]- Returns
Returns a list containing the seeds for each individual env. Note that all list elements may be None, if the env does not return anything when being seeded.
-
abstract
set_attr
(attr_name, value, indices=None)[source]¶ Set attribute inside vectorized environments.
- Parameters
attr_name (
str
) – The name of attribute to assign new valuevalue (
Any
) – Value to assign to attr_nameindices (
Union
[None
,int
,Iterable
[int
]]) – Indices of envs to assign value
- Return type
None
- Returns
-
step
(actions)[source]¶ Step the environments with the given action
- Parameters
actions (
ndarray
) – the action- Return type
Tuple
[Union
[ndarray
,Dict
[str
,ndarray
],Tuple
[ndarray
, …]],ndarray
,ndarray
,List
[Dict
]]- Returns
observation, reward, done, information
DummyVecEnv¶
-
class
stable_baselines3.common.vec_env.
DummyVecEnv
(env_fns)[source]¶ Creates a simple vectorized wrapper for multiple environments, calling each environment in sequence on the current Python process. This is useful for computationally simple environment such as
cartpole-v1
, as the overhead of multiprocess or multithread outweighs the environment computation time. This can also be used for RL methods that require a vectorized environment, but that you want a single environments to train with.- Parameters
env_fns (
List
[Callable
[[],Env
]]) – a list of functions that return environments to vectorize
-
env_is_wrapped
(wrapper_class, indices=None)[source]¶ Check if worker environments are wrapped with a given wrapper
- Return type
List
[bool
]
-
env_method
(method_name, *method_args, indices=None, **method_kwargs)[source]¶ Call instance methods of vectorized environments.
- Return type
List
[Any
]
-
get_attr
(attr_name, indices=None)[source]¶ Return attribute from vectorized environment (see base class).
- Return type
List
[Any
]
-
render
(mode='human')[source]¶ Gym environment rendering. If there are multiple environments then they are tiled together in one image via
BaseVecEnv.render()
. Otherwise (ifself.num_envs == 1
), we pass the render call directly to the underlying environment.Therefore, some arguments such as
mode
will have values that are valid only whennum_envs == 1
.- Parameters
mode (
str
) – The rendering type.- Return type
Optional
[ndarray
]
-
reset
()[source]¶ Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Return type
Union
[ndarray
,Dict
[str
,ndarray
],Tuple
[ndarray
, …]]- Returns
observation
-
seed
(seed=None)[source]¶ Sets the random seeds for all environments, based on a given seed. Each individual environment will still get its own seed, by incrementing the given seed.
- Parameters
seed (
Optional
[int
]) – The random seed. May be None for completely random seeding.- Return type
List
[Optional
[int
]]- Returns
Returns a list containing the seeds for each individual env. Note that all list elements may be None, if the env does not return anything when being seeded.
-
set_attr
(attr_name, value, indices=None)[source]¶ Set attribute inside vectorized environments (see base class).
- Return type
None
SubprocVecEnv¶
-
class
stable_baselines3.common.vec_env.
SubprocVecEnv
(env_fns, start_method=None)[source]¶ Creates a multiprocess vectorized wrapper for multiple environments, distributing each environment to its own process, allowing significant speed up when the environment is computationally complex.
For performance reasons, if your environment is not IO bound, the number of environments should not exceed the number of logical cores on your CPU.
Warning
Only ‘forkserver’ and ‘spawn’ start methods are thread-safe, which is important when TensorFlow sessions or other non thread-safe libraries are used in the parent (see issue #217). However, compared to ‘fork’ they incur a small start-up cost and have restrictions on global variables. With those methods, users must wrap the code in an
if __name__ == "__main__":
block. For more information, see the multiprocessing documentation.- Parameters
env_fns (
List
[Callable
[[],Env
]]) – Environments to run in subprocessesstart_method (
Optional
[str
]) – method used to start the subprocesses. Must be one of the methods returned by multiprocessing.get_all_start_methods(). Defaults to ‘forkserver’ on available platforms, and ‘spawn’ otherwise.
-
env_is_wrapped
(wrapper_class, indices=None)[source]¶ Check if worker environments are wrapped with a given wrapper
- Return type
List
[bool
]
-
env_method
(method_name, *method_args, indices=None, **method_kwargs)[source]¶ Call instance methods of vectorized environments.
- Return type
List
[Any
]
-
get_attr
(attr_name, indices=None)[source]¶ Return attribute from vectorized environment (see base class).
- Return type
List
[Any
]
-
reset
()[source]¶ Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Return type
Union
[ndarray
,Dict
[str
,ndarray
],Tuple
[ndarray
, …]]- Returns
observation
-
seed
(seed=None)[source]¶ Sets the random seeds for all environments, based on a given seed. Each individual environment will still get its own seed, by incrementing the given seed.
- Parameters
seed (
Optional
[int
]) – The random seed. May be None for completely random seeding.- Return type
List
[Optional
[int
]]- Returns
Returns a list containing the seeds for each individual env. Note that all list elements may be None, if the env does not return anything when being seeded.
-
set_attr
(attr_name, value, indices=None)[source]¶ Set attribute inside vectorized environments (see base class).
- Return type
None
Wrappers¶
VecFrameStack¶
-
class
stable_baselines3.common.vec_env.
VecFrameStack
(venv, n_stack, channels_order=None)[source]¶ Frame stacking wrapper for vectorized environment. Designed for image observations.
Dimension to stack over is either first (channels-first) or last (channels-last), which is detected automatically using
common.preprocessing.is_image_space_channels_first
if observation is an image space.- Parameters
venv (
VecEnv
) – the vectorized environment to wrapn_stack (
int
) – Number of frames to stackchannels_order (
Optional
[str
]) – If “first”, stack on first image dimension. If “last”, stack on last dimension. If None, automatically detect channel to stack over in case of image observation or default to “last” (default).
VecNormalize¶
-
class
stable_baselines3.common.vec_env.
VecNormalize
(venv, training=True, norm_obs=True, norm_reward=True, clip_obs=10.0, clip_reward=10.0, gamma=0.99, epsilon=1e-08)[source]¶ A moving average, normalizing wrapper for vectorized environment. has support for saving/loading moving average,
- Parameters
venv (
VecEnv
) – the vectorized environment to wraptraining (
bool
) – Whether to update or not the moving averagenorm_obs (
bool
) – Whether to normalize observation or not (default: True)norm_reward (
bool
) – Whether to normalize rewards or not (default: True)clip_obs (
float
) – Max absolute value for observationclip_reward (
float
) – Max value absolute for discounted rewardgamma (
float
) – discount factorepsilon (
float
) – To avoid division by zero
-
get_original_obs
()[source]¶ Returns an unnormalized version of the observations from the most recent step or reset.
- Return type
Union
[ndarray
,Dict
[str
,ndarray
]]
-
get_original_reward
()[source]¶ Returns an unnormalized version of the rewards from the most recent step.
- Return type
ndarray
-
static
load
(load_path, venv)[source]¶ Loads a saved VecNormalize object.
- Parameters
load_path (
str
) – the path to load from.venv (
VecEnv
) – the VecEnv to wrap.
- Return type
VecNormalize
- Returns
-
normalize_obs
(obs)[source]¶ Normalize observations using this VecNormalize’s observations statistics. Calling this method does not update statistics.
- Return type
Union
[ndarray
,Dict
[str
,ndarray
]]
-
normalize_reward
(reward)[source]¶ Normalize rewards using this VecNormalize’s rewards statistics. Calling this method does not update statistics.
- Return type
ndarray
-
reset
()[source]¶ Reset all environments :rtype:
Union
[ndarray
,Dict
[str
,ndarray
]] :return: first observation of the episode
-
save
(save_path)[source]¶ Save current VecNormalize object with all running statistics and settings (e.g. clip_obs)
- Parameters
save_path (
str
) – The path to save to- Return type
None
VecVideoRecorder¶
-
class
stable_baselines3.common.vec_env.
VecVideoRecorder
(venv, video_folder, record_video_trigger, video_length=200, name_prefix='rl-video')[source]¶ Wraps a VecEnv or VecEnvWrapper object to record rendered image as mp4 video. It requires ffmpeg or avconv to be installed on the machine.
- Parameters
venv (
VecEnv
) –video_folder (
str
) – Where to save videosrecord_video_trigger (
Callable
[[int
],bool
]) – Function that defines when to start recording. The function takes the current number of step, and returns whether we should start recording or not.video_length (
int
) – Length of recorded videosname_prefix (
str
) – Prefix to the video name
-
reset
()[source]¶ Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Return type
Union
[ndarray
,Dict
[str
,ndarray
],Tuple
[ndarray
, …]]- Returns
observation
VecCheckNan¶
-
class
stable_baselines3.common.vec_env.
VecCheckNan
(venv, raise_exception=False, warn_once=True, check_inf=True)[source]¶ NaN and inf checking wrapper for vectorized environment, will raise a warning by default, allowing you to know from what the NaN of inf originated from.
- Parameters
venv (
VecEnv
) – the vectorized environment to wrapraise_exception (
bool
) – Whether or not to raise a ValueError, instead of a UserWarningwarn_once (
bool
) – Whether or not to only warn once.check_inf (
bool
) – Whether or not to check for +inf or -inf as well
-
reset
()[source]¶ Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Return type
Union
[ndarray
,Dict
[str
,ndarray
],Tuple
[ndarray
, …]]- Returns
observation
VecTransposeImage¶
-
class
stable_baselines3.common.vec_env.
VecTransposeImage
(venv)[source]¶ Re-order channels, from HxWxC to CxHxW. It is required for PyTorch convolution layers.
- Parameters
venv (
VecEnv
) –
-
step_wait
()[source]¶ Wait for the step taken with step_async().
- Return type
Tuple
[Union
[ndarray
,Dict
[str
,ndarray
],Tuple
[ndarray
, …]],ndarray
,ndarray
,List
[Dict
]]- Returns
observation, reward, done, information