Utils

stable_baselines3.common.utils.check_for_correct_spaces(env: Union[gym.core.Env, stable_baselines3.common.vec_env.base_vec_env.VecEnv], observation_space: gym.spaces.space.Space, action_space: gym.spaces.space.Space)[source]

Checks that the environment has same spaces as provided ones. Used by BaseAlgorithm to check if spaces match after loading the model with given env. Checked parameters: - observation_space - action_space

Parameters
  • env – (GymEnv) Environment to check for valid spaces

  • observation_space – (gym.spaces.Space) Observation space to check against

  • action_space – (gym.spaces.Space) Action space to check against

stable_baselines3.common.utils.configure_logger(verbose: int = 0, tensorboard_log: Optional[str] = None, tb_log_name: str = '', reset_num_timesteps: bool = True) → None[source]

Configure the logger’s outputs.

Parameters
  • verbose – (int) the verbosity level: 0 no output, 1 info, 2 debug

  • tensorboard_log – (str) the log location for tensorboard (if None, no logging)

  • tb_log_name – (str) tensorboard log

stable_baselines3.common.utils.constant_fn(val: float) → Callable[source]

Create a function that returns a constant It is useful for learning rate schedule (to avoid code duplication)

Parameters

val – (float)

Returns

(Callable)

stable_baselines3.common.utils.explained_variance(y_pred: numpy.ndarray, y_true: numpy.ndarray) → numpy.ndarray[source]

Computes fraction of variance that ypred explains about y. Returns 1 - Var[y-ypred] / Var[y]

interpretation:

ev=0 => might as well have predicted zero ev=1 => perfect prediction ev<0 => worse than just predicting zero

Parameters
  • y_pred – (np.ndarray) the prediction

  • y_true – (np.ndarray) the expected value

Returns

(float) explained variance of ypred and y

stable_baselines3.common.utils.get_device(device: Union[torch.device, str] = 'auto') → torch.device[source]

Retrieve PyTorch device. It checks that the requested device is available first. For now, it supports only cpu and cuda. By default, it tries to use the gpu.

Parameters

device – (Union[str, th.device]) One for ‘auto’, ‘cuda’, ‘cpu’

Returns

(th.device)

stable_baselines3.common.utils.get_latest_run_id(log_path: Optional[str] = None, log_name: str = '') → int[source]

Returns the latest run number for the given log name and log path, by finding the greatest number in the directories.

Returns

(int) latest run number

stable_baselines3.common.utils.get_linear_fn(start: float, end: float, end_fraction: float) → Callable[source]

Create a function that interpolates linearly between start and end between progress_remaining = 1 and progress_remaining = end_fraction. This is used in DQN for linearly annealing the exploration fraction (epsilon for the epsilon-greedy strategy).

Params start

(float) value to start with if progress_remaining = 1

Params end

(float) value to end with if progress_remaining = 0

Params end_fraction

(float) fraction of progress_remaining where end is reached e.g 0.1 then end is reached after 10% of the complete training process.

Returns

(Callable)

stable_baselines3.common.utils.get_schedule_fn(value_schedule: Union[Callable, float]) → Callable[source]

Transform (if needed) learning rate and clip range (for PPO) to callable.

Parameters

value_schedule – (callable or float)

Returns

(function)

stable_baselines3.common.utils.is_vectorized_observation(observation: numpy.ndarray, observation_space: gym.spaces.space.Space) → bool[source]

For every observation type, detects and validates the shape, then returns whether or not the observation is vectorized.

Parameters
  • observation – (np.ndarray) the input observation to validate

  • observation_space – (gym.spaces) the observation space

Returns

(bool) whether the given observation is vectorized or not

stable_baselines3.common.utils.polyak_update(params: Iterable[torch.nn.parameter.Parameter], target_params: Iterable[torch.nn.parameter.Parameter], tau: float) → None[source]

Perform a Polyak average update on target_params using params: target parameters are slowly updated towards the main parameters. tau, the soft update coefficient controls the interpolation: tau=1 corresponds to copying the parameters to the target ones whereas nothing happens when tau=0. The Polyak update is done in place, with no_grad, and therefore does not create intermediate tensors, or a computation graph, reducing memory cost and improving performance. We scale the target params by 1-tau (in-place), add the new weights, scaled by tau and store the result of the sum in the target params (in place). See https://github.com/DLR-RM/stable-baselines3/issues/93

Parameters
  • params – (Iterable[th.nn.Parameter]) parameters to use to update the target params

  • target_params – (Iterable[th.nn.Parameter]) parameters to update

  • tau – (float) the soft update coefficient (“Polyak update”, between 0 and 1)

stable_baselines3.common.utils.safe_mean(arr: Union[numpy.ndarray, list, collections.deque]) → numpy.ndarray[source]

Compute the mean of an array if there is at least one element. For empty array, return NaN. It is used for logging only.

Parameters

arr

Returns

stable_baselines3.common.utils.set_random_seed(seed: int, using_cuda: bool = False) → None[source]

Seed the different random generators :param seed: (int) :param using_cuda: (bool)

stable_baselines3.common.utils.update_learning_rate(optimizer: torch.optim.optimizer.Optimizer, learning_rate: float) → None[source]

Update the learning rate for a given optimizer. Useful when doing linear schedule.

Parameters
  • optimizer – (th.optim.Optimizer)

  • learning_rate – (float)