Utils¶
-
stable_baselines3.common.utils.
check_for_correct_spaces
(env: Union[gym.core.Env, stable_baselines3.common.vec_env.base_vec_env.VecEnv], observation_space: gym.spaces.space.Space, action_space: gym.spaces.space.Space)[source]¶ Checks that the environment has same spaces as provided ones. Used by BaseAlgorithm to check if spaces match after loading the model with given env. Checked parameters: - observation_space - action_space
- Parameters
env – (GymEnv) Environment to check for valid spaces
observation_space – (gym.spaces.Space) Observation space to check against
action_space – (gym.spaces.Space) Action space to check against
-
stable_baselines3.common.utils.
configure_logger
(verbose: int = 0, tensorboard_log: Optional[str] = None, tb_log_name: str = '', reset_num_timesteps: bool = True) → None[source]¶ Configure the logger’s outputs.
- Parameters
verbose – (int) the verbosity level: 0 no output, 1 info, 2 debug
tensorboard_log – (str) the log location for tensorboard (if None, no logging)
tb_log_name – (str) tensorboard log
-
stable_baselines3.common.utils.
constant_fn
(val: float) → Callable[source]¶ Create a function that returns a constant It is useful for learning rate schedule (to avoid code duplication)
- Parameters
val – (float)
- Returns
(Callable)
-
stable_baselines3.common.utils.
explained_variance
(y_pred: numpy.ndarray, y_true: numpy.ndarray) → numpy.ndarray[source]¶ Computes fraction of variance that ypred explains about y. Returns 1 - Var[y-ypred] / Var[y]
- interpretation:
ev=0 => might as well have predicted zero ev=1 => perfect prediction ev<0 => worse than just predicting zero
- Parameters
y_pred – (np.ndarray) the prediction
y_true – (np.ndarray) the expected value
- Returns
(float) explained variance of ypred and y
-
stable_baselines3.common.utils.
get_device
(device: Union[torch.device, str] = 'auto') → torch.device[source]¶ Retrieve PyTorch device. It checks that the requested device is available first. For now, it supports only cpu and cuda. By default, it tries to use the gpu.
- Parameters
device – (Union[str, th.device]) One for ‘auto’, ‘cuda’, ‘cpu’
- Returns
(th.device)
-
stable_baselines3.common.utils.
get_latest_run_id
(log_path: Optional[str] = None, log_name: str = '') → int[source]¶ Returns the latest run number for the given log name and log path, by finding the greatest number in the directories.
- Returns
(int) latest run number
-
stable_baselines3.common.utils.
get_linear_fn
(start: float, end: float, end_fraction: float) → Callable[source]¶ Create a function that interpolates linearly between start and end between
progress_remaining
= 1 andprogress_remaining
=end_fraction
. This is used in DQN for linearly annealing the exploration fraction (epsilon for the epsilon-greedy strategy).- Params start
(float) value to start with if
progress_remaining
= 1- Params end
(float) value to end with if
progress_remaining
= 0- Params end_fraction
(float) fraction of
progress_remaining
where end is reached e.g 0.1 then end is reached after 10% of the complete training process.- Returns
(Callable)
-
stable_baselines3.common.utils.
get_schedule_fn
(value_schedule: Union[Callable, float]) → Callable[source]¶ Transform (if needed) learning rate and clip range (for PPO) to callable.
- Parameters
value_schedule – (callable or float)
- Returns
(function)
-
stable_baselines3.common.utils.
is_vectorized_observation
(observation: numpy.ndarray, observation_space: gym.spaces.space.Space) → bool[source]¶ For every observation type, detects and validates the shape, then returns whether or not the observation is vectorized.
- Parameters
observation – (np.ndarray) the input observation to validate
observation_space – (gym.spaces) the observation space
- Returns
(bool) whether the given observation is vectorized or not
-
stable_baselines3.common.utils.
polyak_update
(params: Iterable[torch.nn.parameter.Parameter], target_params: Iterable[torch.nn.parameter.Parameter], tau: float) → None[source]¶ Perform a Polyak average update on
target_params
usingparams
: target parameters are slowly updated towards the main parameters.tau
, the soft update coefficient controls the interpolation:tau=1
corresponds to copying the parameters to the target ones whereas nothing happens whentau=0
. The Polyak update is done in place, withno_grad
, and therefore does not create intermediate tensors, or a computation graph, reducing memory cost and improving performance. We scale the target params by1-tau
(in-place), add the new weights, scaled bytau
and store the result of the sum in the target params (in place). See https://github.com/DLR-RM/stable-baselines3/issues/93- Parameters
params – (Iterable[th.nn.Parameter]) parameters to use to update the target params
target_params – (Iterable[th.nn.Parameter]) parameters to update
tau – (float) the soft update coefficient (“Polyak update”, between 0 and 1)
-
stable_baselines3.common.utils.
safe_mean
(arr: Union[numpy.ndarray, list, collections.deque]) → numpy.ndarray[source]¶ Compute the mean of an array if there is at least one element. For empty array, return NaN. It is used for logging only.
- Parameters
arr –
- Returns