Vectorized Environments
Vectorized Environments are a method for stacking multiple independent environments into a single environment.
Instead of training an RL agent on 1 environment per step, it allows us to train it on n
environments per step.
Because of this, actions
passed to the environment are now a vector (of dimension n
).
It is the same for observations
, rewards
and end of episode signals (dones
).
In the case of non-array observation spaces such as Dict
or Tuple
, where different sub-spaces
may have different shapes, the sub-observations are vectors (of dimension n
).
Name |
|
|
|
|
Multi Processing |
---|---|---|---|---|---|
DummyVecEnv |
✔️ |
✔️ |
✔️ |
✔️ |
❌️ |
SubprocVecEnv |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
Note
Vectorized environments are required when using wrappers for frame-stacking or normalization.
Note
When using vectorized environments, the environments are automatically reset at the end of each episode.
Thus, the observation returned for the i-th environment when done[i]
is true will in fact be the first observation of the next episode, not the last observation of the episode that has just terminated.
You can access the “real” final observation of the terminated episode—that is, the one that accompanied the done
event provided by the underlying environment—using the terminal_observation
keys in the info dicts returned by the VecEnv
.
Warning
When defining a custom VecEnv
(for instance, using gym3 ProcgenEnv
), you should provide terminal_observation
keys in the info dicts returned by the VecEnv
(cf. note above).
Warning
When using SubprocVecEnv
, users must wrap the code in an if __name__ == "__main__":
if using the forkserver
or spawn
start method (default on Windows).
On Linux, the default start method is fork
which is not thread safe and can create deadlocks.
For more information, see Python’s multiprocessing guidelines.
VecEnv API vs Gym API
For consistency across Stable-Baselines3 (SB3) versions and because of its special requirements and features, SB3 VecEnv API is not the same as Gym API. SB3 VecEnv API is actually close to Gym 0.21 API but differs to Gym 0.26+ API:
the
reset()
method only returns the observation (obs = vec_env.reset()
) and not a tuple, the info at reset are stored invec_env.reset_infos
.only the initial call to
vec_env.reset()
is required, environments are reset automatically afterward (andreset_infos
is updated automatically).the
vec_env.step(actions)
method expects an array as input (with a batch size corresponding to the number of environments) and returns a 4-tuple (and not a 5-tuple):obs, rewards, dones, infos
instead ofobs, reward, terminated, truncated, info
wheredones = terminated or truncated
(for each env).obs, rewards, dones
are NumPy arrays with shape(n_envs, shape_for_single_env)
(so with a batch dimension). Additional information is passed via theinfos
value which is a list of dictionaries.at the end of an episode,
infos[env_idx]["TimeLimit.truncated"] = truncated and not terminated
tells the user if an episode was truncated or not: you should bootstrap ifinfos[env_idx]["TimeLimit.truncated"] is True
(episode over due to a timeout/truncation) ordones[env_idx] is False
(episode not finished). Note: compared to Gym 0.26+infos[env_idx]["TimeLimit.truncated"]
andterminated
are mutually exclusive. The conversion from SB3 to Gym API is# done is True at the end of an episode # dones[env_idx] = terminated[env_idx] or truncated[env_idx] # In SB3, truncated and terminated are mutually exclusive # infos[env_idx]["TimeLimit.truncated"] = truncated and not terminated # terminated[env_idx] tells you whether you should bootstrap or not: # when the episode has not ended or when the termination was a timeout/truncation terminated[env_idx] = dones[env_idx] and not infos[env_idx]["TimeLimit.truncated"] should_bootstrap[env_idx] = not terminated[env_idx]
at the end of an episode, because the environment resets automatically, we provide
infos[env_idx]["terminal_observation"]
which contains the last observation of an episode (and can be used when bootstrapping, see note in the previous section)to overcome the current Gymnasium limitation (only one render mode allowed per env instance, see issue #100), we recommend using
render_mode="rgb_array"
since we can both have the image as a numpy array and display it with OpenCV. if no mode is passed ormode="rgb_array"
is passed when callingvec_env.render
then we use the default mode, otherwise, we use the OpenCV display. Note that ifrender_mode != "rgb_array"
, you can only callvec_env.render()
(without argument or withmode=env.render_mode
).the
reset()
method doesn’t take any parameter. If you want to seed the pseudo-random generator or pass options, you should callvec_env.seed(seed=seed)
/vec_env.set_options(options)
andobs = vec_env.reset()
afterward (seed and options are discarded after each call toreset()
).methods and attributes of the underlying Gym envs can be accessed, called and set using
vec_env.get_attr("attribute_name")
,vec_env.env_method("method_name", args1, args2, kwargs1=kwargs1)
andvec_env.set_attr("attribute_name", new_value)
.
Modifying Vectorized Environments Attributes
If you plan to modify the attributes of an environment while it is used (e.g., modifying an attribute specifying the task carried out for a portion of training when doing multi-task learning, or
a parameter of the environment dynamics), you must expose a setter method.
In fact, directly accessing the environment attribute in the callback can lead to unexpected behavior because environments can be wrapped (using gym or VecEnv wrappers, the Monitor
wrapper being one example).
Consider the following example for a custom env:
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3.common.env_util import make_vec_env
class MyMultiTaskEnv(gym.Env):
def __init__(self):
super().__init__()
"""
A state and action space for robotic locomotion.
The multi-task twist is that the policy would need to adapt to different terrains, each with its own
friction coefficient, mu.
The friction coefficient is the only parameter that changes between tasks.
mu is a scalar between 0 and 1, and during training a callback is used to update mu.
"""
...
def step(self, action):
# Do something, depending on the action and current value of mu the next state is computed
return self._get_obs(), reward, done, truncated, info
def set_mu(self, new_mu: float) -> None:
# Note: this value should be used only at the next reset
self.mu = new_mu
# Example of wrapped env
# env is of type <TimeLimit<OrderEnforcing<PassiveEnvChecker<CartPoleEnv<CartPole-v1>>>>>
env = gym.make("CartPole-v1")
# To access the base env, without wrapper, you should use `.unwrapped`
# or env.get_wrapper_attr("gravity") to include wrappers
env.unwrapped.gravity
# SB3 uses VecEnv for training, where `env.unwrapped.x = new_value` cannot be used to set an attribute
# therefore, you should expose a setter like `set_mu` to properly set an attribute
vec_env = make_vec_env(MyMultiTaskEnv)
# Print current mu value
# Note: you should use vec_env.env_method("get_wrapper_attr", "mu") in Gymnasium v1.0
print(vec_env.env_method("get_wrapper_attr", "mu"))
# Change `mu` attribute via the setter
vec_env.env_method("set_mu", "mu", 0.1)
In this example env.mu
cannot be accessed/changed directly because it is wrapped in a VecEnv
and because it could be wrapped with other wrappers (see GH#1573 for a longer explanation).
Instead, the callback should use the set_mu
method via the env_method
method for Vectorized Environments.
from itertools import cycle
class ChangeMuCallback(BaseCallback):
"""
This callback changes the value of mu during training looping
through a list of values until training is aborted.
The environment is implemented so that the impact of changing
the value of mu mid-episode is visible only after the episode is over
and the reset method has been called.
""""
def __init__(self):
super().__init__()
# An iterator that contains the different of the friction coefficient
self.mus = cycle([0.1, 0.2, 0.5, 0.13, 0.9])
def _on_step(self):
# Note: in practice, you should not change this value at every step
# but rather depending on some events/metrics like agent performance/episode termination
# both accessible via the `self.logger` or `self.locals` variables
self.training_env.env_method("set_mu", next(self.mus))
This callback can then be used to safely modify environment attributes during training since it calls the environment setter method.
Vectorized Environments Wrappers
If you want to alter or augment a VecEnv
without redefining it completely (e.g. stack multiple frames, monitor the VecEnv
, normalize the observation, …), you can use VecEnvWrapper
for that.
They are the vectorized equivalents (i.e., they act on multiple environments at the same time) of gym.Wrapper
.
You can find below an example for extracting one key from the observation:
import numpy as np
from stable_baselines3.common.vec_env.base_vec_env import VecEnv, VecEnvStepReturn, VecEnvWrapper
class VecExtractDictObs(VecEnvWrapper):
"""
A vectorized wrapper for filtering a specific key from dictionary observations.
Similar to Gym's FilterObservation wrapper:
https://github.com/openai/gym/blob/master/gym/wrappers/filter_observation.py
:param venv: The vectorized environment
:param key: The key of the dictionary observation
"""
def __init__(self, venv: VecEnv, key: str):
self.key = key
super().__init__(venv=venv, observation_space=venv.observation_space.spaces[self.key])
def reset(self) -> np.ndarray:
obs = self.venv.reset()
return obs[self.key]
def step_async(self, actions: np.ndarray) -> None:
self.venv.step_async(actions)
def step_wait(self) -> VecEnvStepReturn:
obs, reward, done, info = self.venv.step_wait()
return obs[self.key], reward, done, info
env = DummyVecEnv([lambda: gym.make("FetchReach-v1")])
# Wrap the VecEnv
env = VecExtractDictObs(env, key="observation")
VecEnv
- class stable_baselines3.common.vec_env.VecEnv(num_envs, observation_space, action_space)[source]
An abstract asynchronous, vectorized environment.
- Parameters:
num_envs (int) – Number of environments
observation_space (Space) – Observation space
action_space (Space) – Action space
- abstract env_is_wrapped(wrapper_class, indices=None)[source]
Check if environments are wrapped with a given wrapper.
- Parameters:
method_name – The name of the environment method to invoke.
indices (None | int | Iterable[int]) – Indices of envs whose method to call
method_args – Any positional arguments to provide in the call
method_kwargs – Any keyword arguments to provide in the call
wrapper_class (Type[Wrapper]) –
- Returns:
True if the env is wrapped, False otherwise, for each env queried.
- Return type:
List[bool]
- abstract env_method(method_name, *method_args, indices=None, **method_kwargs)[source]
Call instance methods of vectorized environments.
- Parameters:
method_name (str) – The name of the environment method to invoke.
indices (None | int | Iterable[int]) – Indices of envs whose method to call
method_args – Any positional arguments to provide in the call
method_kwargs – Any keyword arguments to provide in the call
- Returns:
List of items returned by the environment’s method call
- Return type:
List[Any]
- abstract get_attr(attr_name, indices=None)[source]
Return attribute from vectorized environment.
- Parameters:
attr_name (str) – The name of the attribute whose value to return
indices (None | int | Iterable[int]) – Indices of envs to get attribute from
- Returns:
List of values of ‘attr_name’ in all environments
- Return type:
List[Any]
- get_images()[source]
Return RGB images from each environment when available
- Return type:
Sequence[ndarray | None]
- getattr_depth_check(name, already_found)[source]
Check if an attribute reference is being hidden in a recursive call to __getattr__
- Parameters:
name (str) – name of attribute to check for
already_found (bool) – whether this attribute has already been found in a wrapper
- Returns:
name of module whose attribute is being shadowed, if any.
- Return type:
str | None
- render(mode=None)[source]
Gym environment rendering
- Parameters:
mode (str | None) – the rendering type
- Return type:
ndarray | None
- abstract reset()[source]
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns:
observation
- Return type:
ndarray | Dict[str, ndarray] | Tuple[ndarray, …]
- seed(seed=None)[source]
Sets the random seeds for all environments, based on a given seed. Each individual environment will still get its own seed, by incrementing the given seed. WARNING: since gym 0.26, those seeds will only be passed to the environment at the next reset.
- Parameters:
seed (int | None) – The random seed. May be None for completely random seeding.
- Returns:
Returns a list containing the seeds for each individual env. Note that all list elements may be None, if the env does not return anything when being seeded.
- Return type:
Sequence[None | int]
- abstract set_attr(attr_name, value, indices=None)[source]
Set attribute inside vectorized environments.
- Parameters:
attr_name (str) – The name of attribute to assign new value
value (Any) – Value to assign to attr_name
indices (None | int | Iterable[int]) – Indices of envs to assign value
- Returns:
- Return type:
None
- set_options(options=None)[source]
Set environment options for all environments. If a dict is passed instead of a list, the same options will be used for all environments. WARNING: Those options will only be passed to the environment at the next reset.
- Parameters:
options (List[Dict] | Dict | None) – A dictionary of environment options to pass to each environment at the next reset.
- Return type:
None
- step(actions)[source]
Step the environments with the given action
- Parameters:
actions (ndarray) – the action
- Returns:
observation, reward, done, information
- Return type:
Tuple[ndarray | Dict[str, ndarray] | Tuple[ndarray, …], ndarray, ndarray, List[Dict]]
DummyVecEnv
- class stable_baselines3.common.vec_env.DummyVecEnv(env_fns)[source]
Creates a simple vectorized wrapper for multiple environments, calling each environment in sequence on the current Python process. This is useful for computationally simple environment such as
Cartpole-v1
, as the overhead of multiprocess or multithread outweighs the environment computation time. This can also be used for RL methods that require a vectorized environment, but that you want a single environments to train with.- Parameters:
env_fns (List[Callable[[], Env]]) – a list of functions that return environments to vectorize
- Raises:
ValueError – If the same environment instance is passed as the output of two or more different env_fn.
- env_is_wrapped(wrapper_class, indices=None)[source]
Check if worker environments are wrapped with a given wrapper
- Parameters:
wrapper_class (Type[Wrapper]) –
indices (None | int | Iterable[int]) –
- Return type:
List[bool]
- env_method(method_name, *method_args, indices=None, **method_kwargs)[source]
Call instance methods of vectorized environments.
- Parameters:
method_name (str) –
indices (None | int | Iterable[int]) –
- Return type:
List[Any]
- get_attr(attr_name, indices=None)[source]
Return attribute from vectorized environment (see base class).
- Parameters:
attr_name (str) –
indices (None | int | Iterable[int]) –
- Return type:
List[Any]
- get_images()[source]
Return RGB images from each environment when available
- Return type:
Sequence[ndarray | None]
- render(mode=None)[source]
Gym environment rendering. If there are multiple environments then they are tiled together in one image via
BaseVecEnv.render()
.- Parameters:
mode (str | None) – The rendering type.
- Return type:
ndarray | None
- reset()[source]
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns:
observation
- Return type:
ndarray | Dict[str, ndarray] | Tuple[ndarray, …]
- set_attr(attr_name, value, indices=None)[source]
Set attribute inside vectorized environments (see base class).
- Parameters:
attr_name (str) –
value (Any) –
indices (None | int | Iterable[int]) –
- Return type:
None
SubprocVecEnv
- class stable_baselines3.common.vec_env.SubprocVecEnv(env_fns, start_method=None)[source]
Creates a multiprocess vectorized wrapper for multiple environments, distributing each environment to its own process, allowing significant speed up when the environment is computationally complex.
For performance reasons, if your environment is not IO bound, the number of environments should not exceed the number of logical cores on your CPU.
Warning
Only ‘forkserver’ and ‘spawn’ start methods are thread-safe, which is important when TensorFlow sessions or other non thread-safe libraries are used in the parent (see issue #217). However, compared to ‘fork’ they incur a small start-up cost and have restrictions on global variables. With those methods, users must wrap the code in an
if __name__ == "__main__":
block. For more information, see the multiprocessing documentation.- Parameters:
env_fns (List[Callable[[], Env]]) – Environments to run in subprocesses
start_method (str | None) – method used to start the subprocesses. Must be one of the methods returned by multiprocessing.get_all_start_methods(). Defaults to ‘forkserver’ on available platforms, and ‘spawn’ otherwise.
- env_is_wrapped(wrapper_class, indices=None)[source]
Check if worker environments are wrapped with a given wrapper
- Parameters:
wrapper_class (Type[Wrapper]) –
indices (None | int | Iterable[int]) –
- Return type:
List[bool]
- env_method(method_name, *method_args, indices=None, **method_kwargs)[source]
Call instance methods of vectorized environments.
- Parameters:
method_name (str) –
indices (None | int | Iterable[int]) –
- Return type:
List[Any]
- get_attr(attr_name, indices=None)[source]
Return attribute from vectorized environment (see base class).
- Parameters:
attr_name (str) –
indices (None | int | Iterable[int]) –
- Return type:
List[Any]
- get_images()[source]
Return RGB images from each environment when available
- Return type:
Sequence[ndarray | None]
- reset()[source]
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns:
observation
- Return type:
ndarray | Dict[str, ndarray] | Tuple[ndarray, …]
- set_attr(attr_name, value, indices=None)[source]
Set attribute inside vectorized environments (see base class).
- Parameters:
attr_name (str) –
value (Any) –
indices (None | int | Iterable[int]) –
- Return type:
None
Wrappers
VecFrameStack
- class stable_baselines3.common.vec_env.VecFrameStack(venv, n_stack, channels_order=None)[source]
Frame stacking wrapper for vectorized environment. Designed for image observations.
- Parameters:
venv (VecEnv) – Vectorized environment to wrap
n_stack (int) – Number of frames to stack
channels_order (str | Mapping[str, str] | None) – If “first”, stack on first image dimension. If “last”, stack on last dimension. If None, automatically detect channel to stack over in case of image observation or default to “last” (default). Alternatively channels_order can be a dictionary which can be used with environments with Dict observation spaces
StackedObservations
- class stable_baselines3.common.vec_env.stacked_observations.StackedObservations(num_envs, n_stack, observation_space, channels_order=None)[source]
Frame stacking wrapper for data.
Dimension to stack over is either first (channels-first) or last (channels-last), which is detected automatically using
common.preprocessing.is_image_space_channels_first
if observation is an image space.- Parameters:
num_envs – Number of environments
n_stack – Number of frames to stack
observation_space – Environment observation space
channels_order – If “first”, stack on first image dimension. If “last”, stack on last dimension. If None, automatically detect channel to stack over in case of image observation or default to “last”. For Dict space, channels_order can also be a dictionary.
- static compute_stacking(n_stack, observation_space, channels_order=None)[source]
Calculates the parameters in order to stack observations
- Parameters:
n_stack (int) – Number of observations to stack
observation_space (Box) – Observation space
channels_order (str | None) – Order of the channels
- Returns:
Tuple of channels_first, stack_dimension, stackedobs, repeat_axis
- Return type:
Tuple[bool, int, Tuple[int, …], int]
- reset(observation)[source]
Reset the stacked_obs, add the reset observation to the stack, and return the stack.
- Parameters:
observation (TObs) – Reset observation
- Returns:
The stacked reset observation
- Return type:
TObs
- update(observations, dones, infos)[source]
Add the observations to the stack and use the dones to update the infos.
- Parameters:
observations (TObs) – Observations
dones (ndarray) – Dones
infos (List[Dict[str, Any]]) – Infos
- Returns:
Tuple of the stacked observations and the updated infos
- Return type:
Tuple[TObs, List[Dict[str, Any]]]
VecNormalize
- class stable_baselines3.common.vec_env.VecNormalize(venv, training=True, norm_obs=True, norm_reward=True, clip_obs=10.0, clip_reward=10.0, gamma=0.99, epsilon=1e-08, norm_obs_keys=None)[source]
A moving average, normalizing wrapper for vectorized environment. has support for saving/loading moving average,
- Parameters:
venv (VecEnv) – the vectorized environment to wrap
training (bool) – Whether to update or not the moving average
norm_obs (bool) – Whether to normalize observation or not (default: True)
norm_reward (bool) – Whether to normalize rewards or not (default: True)
clip_obs (float) – Max absolute value for observation
clip_reward (float) – Max value absolute for discounted reward
gamma (float) – discount factor
epsilon (float) – To avoid division by zero
norm_obs_keys (List[str] | None) – Which keys from observation dict to normalize. If not specified, all keys will be normalized.
- get_original_obs()[source]
Returns an unnormalized version of the observations from the most recent step or reset.
- Return type:
ndarray | Dict[str, ndarray]
- get_original_reward()[source]
Returns an unnormalized version of the rewards from the most recent step.
- Return type:
ndarray
- static load(load_path, venv)[source]
Loads a saved VecNormalize object.
- Parameters:
load_path (str) – the path to load from.
venv (VecEnv) – the VecEnv to wrap.
- Returns:
- Return type:
- normalize_obs(obs)[source]
Normalize observations using this VecNormalize’s observations statistics. Calling this method does not update statistics.
- Parameters:
obs (ndarray | Dict[str, ndarray]) –
- Return type:
ndarray | Dict[str, ndarray]
- normalize_reward(reward)[source]
Normalize rewards using this VecNormalize’s rewards statistics. Calling this method does not update statistics.
- Parameters:
reward (ndarray) –
- Return type:
ndarray
- reset()[source]
Reset all environments :return: first observation of the episode
- Return type:
ndarray | Dict[str, ndarray]
- save(save_path)[source]
Save current VecNormalize object with all running statistics and settings (e.g. clip_obs)
- Parameters:
save_path (str) – The path to save to
- Return type:
None
VecVideoRecorder
- class stable_baselines3.common.vec_env.VecVideoRecorder(venv, video_folder, record_video_trigger, video_length=200, name_prefix='rl-video')[source]
Wraps a VecEnv or VecEnvWrapper object to record rendered image as mp4 video. It requires ffmpeg or avconv to be installed on the machine.
- Parameters:
venv (VecEnv) –
video_folder (str) – Where to save videos
record_video_trigger (Callable[[int], bool]) – Function that defines when to start recording. The function takes the current number of step, and returns whether we should start recording or not.
video_length (int) – Length of recorded videos
name_prefix (str) – Prefix to the video name
- reset()[source]
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns:
observation
- Return type:
ndarray | Dict[str, ndarray] | Tuple[ndarray, …]
VecCheckNan
- class stable_baselines3.common.vec_env.VecCheckNan(venv, raise_exception=False, warn_once=True, check_inf=True)[source]
NaN and inf checking wrapper for vectorized environment, will raise a warning by default, allowing you to know from what the NaN of inf originated from.
- Parameters:
venv (VecEnv) – the vectorized environment to wrap
raise_exception (bool) – Whether to raise a ValueError, instead of a UserWarning
warn_once (bool) – Whether to only warn once.
check_inf (bool) – Whether to check for +inf or -inf as well
- check_array_value(name, value)[source]
Check for inf and NaN for a single numpy array.
- Parameters:
name (str) – Name of the value being check
value (ndarray) – Value (numpy array) to check
- Returns:
A list of issues found.
- Return type:
List[Tuple[str, str]]
- reset()[source]
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns:
observation
- Return type:
ndarray | Dict[str, ndarray] | Tuple[ndarray, …]
VecTransposeImage
- class stable_baselines3.common.vec_env.VecTransposeImage(venv, skip=False)[source]
Re-order channels, from HxWxC to CxHxW. It is required for PyTorch convolution layers.
- Parameters:
venv (VecEnv) –
skip (bool) – Skip this wrapper if needed as we rely on heuristic to apply it or not, which may result in unwanted behavior, see GH issue #671.
- step_wait()[source]
Wait for the step taken with step_async().
- Returns:
observation, reward, done, information
- Return type:
Tuple[ndarray | Dict[str, ndarray] | Tuple[ndarray, …], ndarray, ndarray, List[Dict]]
- static transpose_image(image)[source]
Transpose an image or batch of images (re-order channels).
- Parameters:
image (ndarray) –
- Returns:
- Return type:
ndarray
VecMonitor
- class stable_baselines3.common.vec_env.VecMonitor(venv, filename=None, info_keywords=())[source]
A vectorized monitor wrapper for vectorized Gym environments, it is used to record the episode reward, length, time and other data.
Some environments like openai/procgen or gym3 directly initialize the vectorized environments, without giving us a chance to use the
Monitor
wrapper. So this class simply does the job of theMonitor
wrapper on a vectorized level.- Parameters:
venv (VecEnv) – The vectorized environment
filename (str | None) – the location to save a log file, can be None for no log
info_keywords (Tuple[str, ...]) – extra information to log, from the information return of env.step()
- reset()[source]
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns:
observation
- Return type:
ndarray | Dict[str, ndarray] | Tuple[ndarray, …]
VecExtractDictObs
- class stable_baselines3.common.vec_env.VecExtractDictObs(venv, key)[source]
A vectorized wrapper for extracting dictionary observations.
- Parameters:
venv (VecEnv) – The vectorized environment
key (str) – The key of the dictionary observation