Probability Distributions

Probability distributions used for the different action spaces:

  • CategoricalDistribution -> Discrete

  • DiagGaussianDistribution -> Box (continuous actions)

  • StateDependentNoiseDistribution -> Box (continuous actions) when use_sde=True

The policy networks output parameters for the distributions (named flat in the methods). Actions are then sampled from those distributions.

For instance, in the case of discrete actions. The policy network outputs probability of taking each action. The CategoricalDistribution allows to sample from it, computes the entropy, the log probability (log_prob) and backpropagate the gradient.

In the case of continuous actions, a Gaussian distribution is used. The policy network outputs mean and (log) std of the distribution (assumed to be a DiagGaussianDistribution).

Probability distributions.

class stable_baselines3.common.distributions.BernoulliDistribution(action_dims)[source]

Bernoulli distribution for MultiBinary action spaces.

Parameters

action_dim – Number of binary actions

actions_from_params(action_logits, deterministic=False)[source]

Returns samples from the probability distribution given its parameters.

Return type

Tensor

Returns

actions

entropy()[source]

Returns Shannon’s entropy of the probability

Return type

Tensor

Returns

the entropy, or None if no analytical form is known

log_prob(actions)[source]

Returns the log likelihood

Parameters

x – the taken action

Return type

Tensor

Returns

The log likelihood of the distribution

log_prob_from_params(action_logits)[source]

Returns samples and the associated log probabilities from the probability distribution given its parameters.

Return type

Tuple[Tensor, Tensor]

Returns

actions and log prob

mode()[source]

Returns the most likely action (deterministic output) from the probability distribution

Return type

Tensor

Returns

the stochastic action

proba_distribution(action_logits)[source]

Set parameters of the distribution.

Return type

BernoulliDistribution

Returns

self

proba_distribution_net(latent_dim)[source]

Create the layer that represents the distribution: it will be the logits of the Bernoulli distribution.

Parameters

latent_dim (int) – Dimension of the last layer of the policy network (before the action layer)

Return type

Module

Returns

sample()[source]

Returns a sample from the probability distribution

Return type

Tensor

Returns

the stochastic action

class stable_baselines3.common.distributions.CategoricalDistribution(action_dim)[source]

Categorical distribution for discrete actions.

Parameters

action_dim (int) – Number of discrete actions

actions_from_params(action_logits, deterministic=False)[source]

Returns samples from the probability distribution given its parameters.

Return type

Tensor

Returns

actions

entropy()[source]

Returns Shannon’s entropy of the probability

Return type

Tensor

Returns

the entropy, or None if no analytical form is known

log_prob(actions)[source]

Returns the log likelihood

Parameters

x – the taken action

Return type

Tensor

Returns

The log likelihood of the distribution

log_prob_from_params(action_logits)[source]

Returns samples and the associated log probabilities from the probability distribution given its parameters.

Return type

Tuple[Tensor, Tensor]

Returns

actions and log prob

mode()[source]

Returns the most likely action (deterministic output) from the probability distribution

Return type

Tensor

Returns

the stochastic action

proba_distribution(action_logits)[source]

Set parameters of the distribution.

Return type

CategoricalDistribution

Returns

self

proba_distribution_net(latent_dim)[source]

Create the layer that represents the distribution: it will be the logits of the Categorical distribution. You can then get probabilities using a softmax.

Parameters

latent_dim (int) – Dimension of the last layer of the policy network (before the action layer)

Return type

Module

Returns

sample()[source]

Returns a sample from the probability distribution

Return type

Tensor

Returns

the stochastic action

class stable_baselines3.common.distributions.DiagGaussianDistribution(action_dim)[source]

Gaussian distribution with diagonal covariance matrix, for continuous actions.

Parameters

action_dim (int) – Dimension of the action space.

actions_from_params(mean_actions, log_std, deterministic=False)[source]

Returns samples from the probability distribution given its parameters.

Return type

Tensor

Returns

actions

entropy()[source]

Returns Shannon’s entropy of the probability

Return type

Tensor

Returns

the entropy, or None if no analytical form is known

log_prob(actions)[source]

Get the log probabilities of actions according to the distribution. Note that you must first call the proba_distribution() method.

Parameters

actions (Tensor) –

Return type

Tensor

Returns

log_prob_from_params(mean_actions, log_std)[source]

Compute the log probability of taking an action given the distribution parameters.

Parameters
  • mean_actions (Tensor) –

  • log_std (Tensor) –

Return type

Tuple[Tensor, Tensor]

Returns

mode()[source]

Returns the most likely action (deterministic output) from the probability distribution

Return type

Tensor

Returns

the stochastic action

proba_distribution(mean_actions, log_std)[source]

Create the distribution given its parameters (mean, std)

Parameters
  • mean_actions (Tensor) –

  • log_std (Tensor) –

Return type

DiagGaussianDistribution

Returns

proba_distribution_net(latent_dim, log_std_init=0.0)[source]

Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)

Parameters
  • latent_dim (int) – Dimension of the last layer of the policy (before the action layer)

  • log_std_init (float) – Initial value for the log standard deviation

Return type

Tuple[Module, Parameter]

Returns

sample()[source]

Returns a sample from the probability distribution

Return type

Tensor

Returns

the stochastic action

class stable_baselines3.common.distributions.Distribution[source]

Abstract base class for distributions.

abstract actions_from_params(*args, **kwargs)[source]

Returns samples from the probability distribution given its parameters.

Return type

Tensor

Returns

actions

abstract entropy()[source]

Returns Shannon’s entropy of the probability

Return type

Optional[Tensor]

Returns

the entropy, or None if no analytical form is known

get_actions(deterministic=False)[source]

Return actions according to the probability distribution.

Parameters

deterministic (bool) –

Return type

Tensor

Returns

abstract log_prob(x)[source]

Returns the log likelihood

Parameters

x (Tensor) – the taken action

Return type

Tensor

Returns

The log likelihood of the distribution

abstract log_prob_from_params(*args, **kwargs)[source]

Returns samples and the associated log probabilities from the probability distribution given its parameters.

Return type

Tuple[Tensor, Tensor]

Returns

actions and log prob

abstract mode()[source]

Returns the most likely action (deterministic output) from the probability distribution

Return type

Tensor

Returns

the stochastic action

abstract proba_distribution(*args, **kwargs)[source]

Set parameters of the distribution.

Return type

Distribution

Returns

self

abstract proba_distribution_net(*args, **kwargs)[source]

Create the layers and parameters that represent the distribution.

Subclasses must define this, but the arguments and return type vary between concrete classes.

Return type

Union[Module, Tuple[Module, Parameter]]

abstract sample()[source]

Returns a sample from the probability distribution

Return type

Tensor

Returns

the stochastic action

class stable_baselines3.common.distributions.MultiCategoricalDistribution(action_dims)[source]

MultiCategorical distribution for multi discrete actions.

Parameters

action_dims (List[int]) – List of sizes of discrete action spaces

actions_from_params(action_logits, deterministic=False)[source]

Returns samples from the probability distribution given its parameters.

Return type

Tensor

Returns

actions

entropy()[source]

Returns Shannon’s entropy of the probability

Return type

Tensor

Returns

the entropy, or None if no analytical form is known

log_prob(actions)[source]

Returns the log likelihood

Parameters

x – the taken action

Return type

Tensor

Returns

The log likelihood of the distribution

log_prob_from_params(action_logits)[source]

Returns samples and the associated log probabilities from the probability distribution given its parameters.

Return type

Tuple[Tensor, Tensor]

Returns

actions and log prob

mode()[source]

Returns the most likely action (deterministic output) from the probability distribution

Return type

Tensor

Returns

the stochastic action

proba_distribution(action_logits)[source]

Set parameters of the distribution.

Return type

MultiCategoricalDistribution

Returns

self

proba_distribution_net(latent_dim)[source]

Create the layer that represents the distribution: it will be the logits (flattened) of the MultiCategorical distribution. You can then get probabilities using a softmax on each sub-space.

Parameters

latent_dim (int) – Dimension of the last layer of the policy network (before the action layer)

Return type

Module

Returns

sample()[source]

Returns a sample from the probability distribution

Return type

Tensor

Returns

the stochastic action

class stable_baselines3.common.distributions.SquashedDiagGaussianDistribution(action_dim, epsilon=1e-06)[source]

Gaussian distribution with diagonal covariance matrix, followed by a squashing function (tanh) to ensure bounds.

Parameters
  • action_dim (int) – Dimension of the action space.

  • epsilon (float) – small value to avoid NaN due to numerical imprecision.

entropy()[source]

Returns Shannon’s entropy of the probability

Return type

Optional[Tensor]

Returns

the entropy, or None if no analytical form is known

log_prob(actions, gaussian_actions=None)[source]

Get the log probabilities of actions according to the distribution. Note that you must first call the proba_distribution() method.

Parameters

actions (Tensor) –

Return type

Tensor

Returns

log_prob_from_params(mean_actions, log_std)[source]

Compute the log probability of taking an action given the distribution parameters.

Parameters
  • mean_actions (Tensor) –

  • log_std (Tensor) –

Return type

Tuple[Tensor, Tensor]

Returns

mode()[source]

Returns the most likely action (deterministic output) from the probability distribution

Return type

Tensor

Returns

the stochastic action

proba_distribution(mean_actions, log_std)[source]

Create the distribution given its parameters (mean, std)

Parameters
  • mean_actions (Tensor) –

  • log_std (Tensor) –

Return type

SquashedDiagGaussianDistribution

Returns

sample()[source]

Returns a sample from the probability distribution

Return type

Tensor

Returns

the stochastic action

class stable_baselines3.common.distributions.StateDependentNoiseDistribution(action_dim, full_std=True, use_expln=False, squash_output=False, learn_features=False, epsilon=1e-06)[source]

Distribution class for using generalized State Dependent Exploration (gSDE). Paper: https://arxiv.org/abs/2005.05719

It is used to create the noise exploration matrix and compute the log probability of an action with that noise.

Parameters
  • action_dim (int) – Dimension of the action space.

  • full_std (bool) – Whether to use (n_features x n_actions) parameters for the std instead of only (n_features,)

  • use_expln (bool) – Use expln() function instead of exp() to ensure a positive standard deviation (cf paper). It allows to keep variance above zero and prevent it from growing too fast. In practice, exp() is usually enough.

  • squash_output (bool) – Whether to squash the output using a tanh function, this ensures bounds are satisfied.

  • learn_features (bool) – Whether to learn features for gSDE or not. This will enable gradients to be backpropagated through the features latent_sde in the code.

  • epsilon (float) – small value to avoid NaN due to numerical imprecision.

actions_from_params(mean_actions, log_std, latent_sde, deterministic=False)[source]

Returns samples from the probability distribution given its parameters.

Return type

Tensor

Returns

actions

entropy()[source]

Returns Shannon’s entropy of the probability

Return type

Optional[Tensor]

Returns

the entropy, or None if no analytical form is known

get_std(log_std)[source]

Get the standard deviation from the learned parameter (log of it by default). This ensures that the std is positive.

Parameters

log_std (Tensor) –

Return type

Tensor

Returns

log_prob(actions)[source]

Returns the log likelihood

Parameters

x – the taken action

Return type

Tensor

Returns

The log likelihood of the distribution

log_prob_from_params(mean_actions, log_std, latent_sde)[source]

Returns samples and the associated log probabilities from the probability distribution given its parameters.

Return type

Tuple[Tensor, Tensor]

Returns

actions and log prob

mode()[source]

Returns the most likely action (deterministic output) from the probability distribution

Return type

Tensor

Returns

the stochastic action

proba_distribution(mean_actions, log_std, latent_sde)[source]

Create the distribution given its parameters (mean, std)

Parameters
  • mean_actions (Tensor) –

  • log_std (Tensor) –

  • latent_sde (Tensor) –

Return type

StateDependentNoiseDistribution

Returns

proba_distribution_net(latent_dim, log_std_init=- 2.0, latent_sde_dim=None)[source]

Create the layers and parameter that represent the distribution: one output will be the deterministic action, the other parameter will be the standard deviation of the distribution that control the weights of the noise matrix.

Parameters
  • latent_dim (int) – Dimension of the last layer of the policy (before the action layer)

  • log_std_init (float) – Initial value for the log standard deviation

  • latent_sde_dim (Optional[int]) – Dimension of the last layer of the features extractor for gSDE. By default, it is shared with the policy network.

Return type

Tuple[Module, Parameter]

Returns

sample()[source]

Returns a sample from the probability distribution

Return type

Tensor

Returns

the stochastic action

sample_weights(log_std, batch_size=1)[source]

Sample weights for the noise exploration matrix, using a centered Gaussian distribution.

Parameters
  • log_std (Tensor) –

  • batch_size (int) –

Return type

None

class stable_baselines3.common.distributions.TanhBijector(epsilon=1e-06)[source]

Bijective transformation of a probability distribution using a squashing function (tanh) TODO: use Pyro instead (https://pyro.ai/)

Parameters

epsilon (float) – small value to avoid NaN due to numerical imprecision.

static atanh(x)[source]

Inverse of Tanh

Taken from Pyro: https://github.com/pyro-ppl/pyro 0.5 * torch.log((1 + x ) / (1 - x))

Return type

Tensor

static inverse(y)[source]

Inverse tanh.

Parameters

y (Tensor) –

Return type

Tensor

Returns

stable_baselines3.common.distributions.kl_divergence(dist_true, dist_pred)[source]

Wrapper for the PyTorch implementation of the full form KL Divergence

Parameters
Return type

Tensor

Returns

KL(dist_true||dist_pred)

stable_baselines3.common.distributions.make_proba_distribution(action_space, use_sde=False, dist_kwargs=None)[source]

Return an instance of Distribution for the correct type of action space

Parameters
  • action_space (Space) – the input action space

  • use_sde (bool) – Force the use of StateDependentNoiseDistribution instead of DiagGaussianDistribution

  • dist_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the probability distribution

Return type

Distribution

Returns

the appropriate Distribution object

stable_baselines3.common.distributions.sum_independent_dims(tensor)[source]

Continuous actions are usually considered to be independent, so we can sum components of the log_prob or the entropy.

Parameters

tensor (Tensor) – shape: (n_batch, n_actions) or (n_batch,)

Return type

Tensor

Returns

shape: (n_batch,)