Probability Distributions¶
Probability distributions used for the different action spaces:
CategoricalDistribution
-> DiscreteDiagGaussianDistribution
-> Box (continuous actions)StateDependentNoiseDistribution
-> Box (continuous actions) whenuse_sde=True
The policy networks output parameters for the distributions (named flat
in the methods).
Actions are then sampled from those distributions.
For instance, in the case of discrete actions. The policy network outputs probability
of taking each action. The CategoricalDistribution
allows to sample from it,
computes the entropy, the log probability (log_prob
) and backpropagate the gradient.
In the case of continuous actions, a Gaussian distribution is used. The policy network outputs
mean and (log) std of the distribution (assumed to be a DiagGaussianDistribution
).
Probability distributions.
- class stable_baselines3.common.distributions.BernoulliDistribution(action_dims)[source]¶
Bernoulli distribution for MultiBinary action spaces.
- Parameters:
action_dim – Number of binary actions
- actions_from_params(action_logits, deterministic=False)[source]¶
Returns samples from the probability distribution given its parameters.
- Return type:
Tensor
- Returns:
actions
- entropy()[source]¶
Returns Shannon’s entropy of the probability
- Return type:
Tensor
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions)[source]¶
Returns the log likelihood
- Parameters:
x – the taken action
- Return type:
Tensor
- Returns:
The log likelihood of the distribution
- log_prob_from_params(action_logits)[source]¶
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
actions and log prob
- mode()[source]¶
Returns the most likely action (deterministic output) from the probability distribution
- Return type:
Tensor
- Returns:
the stochastic action
- proba_distribution(action_logits)[source]¶
Set parameters of the distribution.
- Return type:
TypeVar
(SelfBernoulliDistribution
, bound= BernoulliDistribution)- Returns:
self
- class stable_baselines3.common.distributions.CategoricalDistribution(action_dim)[source]¶
Categorical distribution for discrete actions.
- Parameters:
action_dim (
int
) – Number of discrete actions
- actions_from_params(action_logits, deterministic=False)[source]¶
Returns samples from the probability distribution given its parameters.
- Return type:
Tensor
- Returns:
actions
- entropy()[source]¶
Returns Shannon’s entropy of the probability
- Return type:
Tensor
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions)[source]¶
Returns the log likelihood
- Parameters:
x – the taken action
- Return type:
Tensor
- Returns:
The log likelihood of the distribution
- log_prob_from_params(action_logits)[source]¶
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
actions and log prob
- mode()[source]¶
Returns the most likely action (deterministic output) from the probability distribution
- Return type:
Tensor
- Returns:
the stochastic action
- proba_distribution(action_logits)[source]¶
Set parameters of the distribution.
- Return type:
TypeVar
(SelfCategoricalDistribution
, bound= CategoricalDistribution)- Returns:
self
- proba_distribution_net(latent_dim)[source]¶
Create the layer that represents the distribution: it will be the logits of the Categorical distribution. You can then get probabilities using a softmax.
- Parameters:
latent_dim (
int
) – Dimension of the last layer of the policy network (before the action layer)- Return type:
Module
- Returns:
- class stable_baselines3.common.distributions.DiagGaussianDistribution(action_dim)[source]¶
Gaussian distribution with diagonal covariance matrix, for continuous actions.
- Parameters:
action_dim (
int
) – Dimension of the action space.
- actions_from_params(mean_actions, log_std, deterministic=False)[source]¶
Returns samples from the probability distribution given its parameters.
- Return type:
Tensor
- Returns:
actions
- entropy()[source]¶
Returns Shannon’s entropy of the probability
- Return type:
Tensor
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions)[source]¶
Get the log probabilities of actions according to the distribution. Note that you must first call the
proba_distribution()
method.- Parameters:
actions (
Tensor
) –- Return type:
Tensor
- Returns:
- log_prob_from_params(mean_actions, log_std)[source]¶
Compute the log probability of taking an action given the distribution parameters.
- Parameters:
mean_actions (
Tensor
) –log_std (
Tensor
) –
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
- mode()[source]¶
Returns the most likely action (deterministic output) from the probability distribution
- Return type:
Tensor
- Returns:
the stochastic action
- proba_distribution(mean_actions, log_std)[source]¶
Create the distribution given its parameters (mean, std)
- Parameters:
mean_actions (
Tensor
) –log_std (
Tensor
) –
- Return type:
TypeVar
(SelfDiagGaussianDistribution
, bound= DiagGaussianDistribution)- Returns:
- proba_distribution_net(latent_dim, log_std_init=0.0)[source]¶
Create the layers and parameter that represent the distribution: one output will be the mean of the Gaussian, the other parameter will be the standard deviation (log std in fact to allow negative values)
- Parameters:
latent_dim (
int
) – Dimension of the last layer of the policy (before the action layer)log_std_init (
float
) – Initial value for the log standard deviation
- Return type:
Tuple
[Module
,Parameter
]- Returns:
- class stable_baselines3.common.distributions.Distribution[source]¶
Abstract base class for distributions.
- abstract actions_from_params(*args, **kwargs)[source]¶
Returns samples from the probability distribution given its parameters.
- Return type:
Tensor
- Returns:
actions
- abstract entropy()[source]¶
Returns Shannon’s entropy of the probability
- Return type:
Optional
[Tensor
]- Returns:
the entropy, or None if no analytical form is known
- get_actions(deterministic=False)[source]¶
Return actions according to the probability distribution.
- Parameters:
deterministic (
bool
) –- Return type:
Tensor
- Returns:
- abstract log_prob(x)[source]¶
Returns the log likelihood
- Parameters:
x (
Tensor
) – the taken action- Return type:
Tensor
- Returns:
The log likelihood of the distribution
- abstract log_prob_from_params(*args, **kwargs)[source]¶
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
actions and log prob
- abstract mode()[source]¶
Returns the most likely action (deterministic output) from the probability distribution
- Return type:
Tensor
- Returns:
the stochastic action
- abstract proba_distribution(*args, **kwargs)[source]¶
Set parameters of the distribution.
- Return type:
TypeVar
(SelfDistribution
, bound= Distribution)- Returns:
self
- class stable_baselines3.common.distributions.MultiCategoricalDistribution(action_dims)[source]¶
MultiCategorical distribution for multi discrete actions.
- Parameters:
action_dims (
List
[int
]) – List of sizes of discrete action spaces
- actions_from_params(action_logits, deterministic=False)[source]¶
Returns samples from the probability distribution given its parameters.
- Return type:
Tensor
- Returns:
actions
- entropy()[source]¶
Returns Shannon’s entropy of the probability
- Return type:
Tensor
- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions)[source]¶
Returns the log likelihood
- Parameters:
x – the taken action
- Return type:
Tensor
- Returns:
The log likelihood of the distribution
- log_prob_from_params(action_logits)[source]¶
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
actions and log prob
- mode()[source]¶
Returns the most likely action (deterministic output) from the probability distribution
- Return type:
Tensor
- Returns:
the stochastic action
- proba_distribution(action_logits)[source]¶
Set parameters of the distribution.
- Return type:
TypeVar
(SelfMultiCategoricalDistribution
, bound= MultiCategoricalDistribution)- Returns:
self
- proba_distribution_net(latent_dim)[source]¶
Create the layer that represents the distribution: it will be the logits (flattened) of the MultiCategorical distribution. You can then get probabilities using a softmax on each sub-space.
- Parameters:
latent_dim (
int
) – Dimension of the last layer of the policy network (before the action layer)- Return type:
Module
- Returns:
- class stable_baselines3.common.distributions.SquashedDiagGaussianDistribution(action_dim, epsilon=1e-06)[source]¶
Gaussian distribution with diagonal covariance matrix, followed by a squashing function (tanh) to ensure bounds.
- Parameters:
action_dim (
int
) – Dimension of the action space.epsilon (
float
) – small value to avoid NaN due to numerical imprecision.
- entropy()[source]¶
Returns Shannon’s entropy of the probability
- Return type:
Optional
[Tensor
]- Returns:
the entropy, or None if no analytical form is known
- log_prob(actions, gaussian_actions=None)[source]¶
Get the log probabilities of actions according to the distribution. Note that you must first call the
proba_distribution()
method.- Parameters:
actions (
Tensor
) –- Return type:
Tensor
- Returns:
- log_prob_from_params(mean_actions, log_std)[source]¶
Compute the log probability of taking an action given the distribution parameters.
- Parameters:
mean_actions (
Tensor
) –log_std (
Tensor
) –
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
- mode()[source]¶
Returns the most likely action (deterministic output) from the probability distribution
- Return type:
Tensor
- Returns:
the stochastic action
- class stable_baselines3.common.distributions.StateDependentNoiseDistribution(action_dim, full_std=True, use_expln=False, squash_output=False, learn_features=False, epsilon=1e-06)[source]¶
Distribution class for using generalized State Dependent Exploration (gSDE). Paper: https://arxiv.org/abs/2005.05719
It is used to create the noise exploration matrix and compute the log probability of an action with that noise.
- Parameters:
action_dim (
int
) – Dimension of the action space.full_std (
bool
) – Whether to use (n_features x n_actions) parameters for the std instead of only (n_features,)use_expln (
bool
) – Useexpln()
function instead ofexp()
to ensure a positive standard deviation (cf paper). It allows to keep variance above zero and prevent it from growing too fast. In practice,exp()
is usually enough.squash_output (
bool
) – Whether to squash the output using a tanh function, this ensures bounds are satisfied.learn_features (
bool
) – Whether to learn features for gSDE or not. This will enable gradients to be backpropagated through the featureslatent_sde
in the code.epsilon (
float
) – small value to avoid NaN due to numerical imprecision.
- actions_from_params(mean_actions, log_std, latent_sde, deterministic=False)[source]¶
Returns samples from the probability distribution given its parameters.
- Return type:
Tensor
- Returns:
actions
- entropy()[source]¶
Returns Shannon’s entropy of the probability
- Return type:
Optional
[Tensor
]- Returns:
the entropy, or None if no analytical form is known
- get_std(log_std)[source]¶
Get the standard deviation from the learned parameter (log of it by default). This ensures that the std is positive.
- Parameters:
log_std (
Tensor
) –- Return type:
Tensor
- Returns:
- log_prob(actions)[source]¶
Returns the log likelihood
- Parameters:
x – the taken action
- Return type:
Tensor
- Returns:
The log likelihood of the distribution
- log_prob_from_params(mean_actions, log_std, latent_sde)[source]¶
Returns samples and the associated log probabilities from the probability distribution given its parameters.
- Return type:
Tuple
[Tensor
,Tensor
]- Returns:
actions and log prob
- mode()[source]¶
Returns the most likely action (deterministic output) from the probability distribution
- Return type:
Tensor
- Returns:
the stochastic action
- proba_distribution(mean_actions, log_std, latent_sde)[source]¶
Create the distribution given its parameters (mean, std)
- Parameters:
mean_actions (
Tensor
) –log_std (
Tensor
) –latent_sde (
Tensor
) –
- Return type:
TypeVar
(SelfStateDependentNoiseDistribution
, bound= StateDependentNoiseDistribution)- Returns:
- proba_distribution_net(latent_dim, log_std_init=-2.0, latent_sde_dim=None)[source]¶
Create the layers and parameter that represent the distribution: one output will be the deterministic action, the other parameter will be the standard deviation of the distribution that control the weights of the noise matrix.
- Parameters:
latent_dim (
int
) – Dimension of the last layer of the policy (before the action layer)log_std_init (
float
) – Initial value for the log standard deviationlatent_sde_dim (
Optional
[int
]) – Dimension of the last layer of the features extractor for gSDE. By default, it is shared with the policy network.
- Return type:
Tuple
[Module
,Parameter
]- Returns:
- class stable_baselines3.common.distributions.TanhBijector(epsilon=1e-06)[source]¶
Bijective transformation of a probability distribution using a squashing function (tanh)
- Parameters:
epsilon (
float
) – small value to avoid NaN due to numerical imprecision.
- static atanh(x)[source]¶
Inverse of Tanh
Taken from Pyro: https://github.com/pyro-ppl/pyro 0.5 * torch.log((1 + x ) / (1 - x))
- Return type:
Tensor
- stable_baselines3.common.distributions.kl_divergence(dist_true, dist_pred)[source]¶
Wrapper for the PyTorch implementation of the full form KL Divergence
- Parameters:
dist_true (
Distribution
) – the p distributiondist_pred (
Distribution
) – the q distribution
- Return type:
Tensor
- Returns:
KL(dist_true||dist_pred)
- stable_baselines3.common.distributions.make_proba_distribution(action_space, use_sde=False, dist_kwargs=None)[source]¶
Return an instance of Distribution for the correct type of action space
- Parameters:
action_space (
Space
) – the input action spaceuse_sde (
bool
) – Force the use of StateDependentNoiseDistribution instead of DiagGaussianDistributiondist_kwargs (
Optional
[Dict
[str
,Any
]]) – Keyword arguments to pass to the probability distribution
- Return type:
- Returns:
the appropriate Distribution object
- stable_baselines3.common.distributions.sum_independent_dims(tensor)[source]¶
Continuous actions are usually considered to be independent, so we can sum components of the
log_prob
or the entropy.- Parameters:
tensor (
Tensor
) – shape: (n_batch, n_actions) or (n_batch,)- Return type:
Tensor
- Returns:
shape: (n_batch,)