Logger

To overwrite the default logger, you can pass one to the algorithm. Available formats are ["stdout", "csv", "log", "tensorboard", "json"].

Warning

When passing a custom logger object, this will overwrite tensorboard_log and verbose settings passed to the constructor.

from stable_baselines3 import A2C
from stable_baselines3.common.logger import configure

tmp_path = "/tmp/sb3_log/"
# set up logger
new_logger = configure(tmp_path, ["stdout", "csv", "tensorboard"])

model = A2C("MlpPolicy", "CartPole-v1", verbose=1)
# Set new logger
model.set_logger(new_logger)
model.learn(10000)

Explanation of logger output

You can find below short explanations of the values logged in Stable-Baselines3 (SB3). Depending on the algorithm used and of the wrappers/callbacks applied, SB3 only logs a subset of those keys during training.

Below you can find an example of the logger output when training a PPO agent:

-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 200         |
|    mean_reward          | -157        |
| rollout/                |             |
|    ep_len_mean          | 200         |
|    ep_rew_mean          | -227        |
| time/                   |             |
|    fps                  | 972         |
|    iterations           | 19          |
|    time_elapsed         | 80          |
|    total_timesteps      | 77824       |
| train/                  |             |
|    approx_kl            | 0.037781604 |
|    clip_fraction        | 0.243       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.06       |
|    explained_variance   | 0.999       |
|    learning_rate        | 0.001       |
|    loss                 | 0.245       |
|    n_updates            | 180         |
|    policy_gradient_loss | -0.00398    |
|    std                  | 0.205       |
|    value_loss           | 0.226       |
-----------------------------------------

eval/

All eval/ values are computed by the EvalCallback.

mean_ep_length: Mean episode length
mean_reward: Mean episodic reward (during evaluation)
success_rate: Mean success rate during evaluation (1.0 means 100% success), the environment info dict must contain an is_success key to compute that value

rollout/

ep_len_mean: Mean episode length (averaged over stats_window_size episodes, 100 by default)
ep_rew_mean: Mean episodic training reward (averaged over stats_window_size episodes, 100 by default), a Monitor wrapper is required to compute that value (automatically added by make_vec_env).
exploration_rate: Current value of the exploration rate when using DQN, it corresponds to the fraction of actions taken randomly (epsilon of the “epsilon-greedy” exploration)
success_rate: Mean success rate during training (averaged over stats_window_size episodes, 100 by default), you must pass an extra argument to the Monitor wrapper to log that value (info_keywords=("is_success",)) and provide info["is_success"]=True/False on the final step of the episode

time/

episodes: Total number of episodes
fps: Number of frames per seconds (includes time taken by gradient update)
iterations: Number of iterations (data collection + policy update for A2C/PPO)
time_elapsed: Time in seconds since the beginning of training
total_timesteps: Total number of timesteps (steps in the environments)

train/

actor_loss: Current value for the actor loss for off-policy algorithms
approx_kl: approximate mean KL divergence between old and new policy (for PPO), it is an estimation of how much changes happened in the update
clip_fraction: mean fraction of surrogate loss that was clipped (above clip_range threshold) for PPO.
clip_range: Current value of the clipping factor for the surrogate loss of PPO
critic_loss: Current value for the critic function loss for off-policy algorithms, usually error between value function output and TD(0), temporal difference estimate
ent_coef: Current value of the entropy coefficient (when using SAC)
ent_coef_loss: Current value of the entropy coefficient loss (when using SAC)
entropy_loss: Mean value of the entropy loss (negative of the average policy entropy)
explained_variance: Fraction of the return variance explained by the value function, see https://scikit-learn.org/stable/modules/model_evaluation.html#explained-variance-score (ev=0 => might as well have predicted zero, ev=1 => perfect prediction, ev<0 => worse than just predicting zero)
learning_rate: Current learning rate value
loss: Current total loss value
n_updates: Number of gradient updates applied so far
policy_gradient_loss: Current value of the policy gradient loss (its value does not have much meaning)
value_loss: Current value for the value function loss for on-policy algorithms, usually error between value function output and Monte-Carlo estimate (or TD(lambda) estimate)
std: Current standard deviation of the noise when using generalized State-Dependent Exploration (gSDE)

class stable_baselines3.common.logger.CSVOutputFormat(filename)[source]

Log to a file, in a CSV format

Parameters:: filename (str) – the file to write the log to

close()[source]

closes the file

Return type:: None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters:

key_values (dict[str, Any])
key_excluded (dict[str, tuple[str, ...]])
step (int)

Return type:

None

class stable_baselines3.common.logger.Figure(figure, close)[source]

Figure data class storing a matplotlib figure and whether to close the figure after logging it

Parameters:

figure (matplotlib.figure.Figure) – figure to log
close (bool) – if true, close the figure after logging it

exception stable_baselines3.common.logger.FormatUnsupportedError(unsupported_formats, value_description)[source]

Custom error to display informative message when a value is not supported by some formats.

Parameters:

unsupported_formats (Sequence[str]) – A sequence of unsupported formats, for instance ["stdout"].
value_description (str) – Description of the value that cannot be logged by this format.

class stable_baselines3.common.logger.HParam(hparam_dict, metric_dict)[source]

Hyperparameter data class storing hyperparameters and metrics in dictionaries

Parameters:

hparam_dict (Mapping[str, bool | str | float | None]) – key-value pairs of hyperparameters to log
metric_dict (Mapping[str, float]) – key-value pairs of metrics to log A non-empty metrics dict is required to display hyperparameters in the corresponding Tensorboard section.

class stable_baselines3.common.logger.HumanOutputFormat(filename_or_file, max_length=36)[source]

A human-readable output format producing ASCII tables of key-value pairs.

Set attribute max_length to change the maximum length of keys and values to write to output (or specify it when calling __init__).

Parameters:

filename_or_file (str | TextIO) – the file to write the log to
max_length (int) – the maximum length of keys and values to write to output. Outputs longer than this will be truncated. An error will be raised if multiple keys are truncated to the same value. The maximum output width will be 2*max_length + 7. The default of 36 produces output no longer than 79 characters wide.

close()[source]

closes the file

Return type:: None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters:

key_values (dict[str, Any])
key_excluded (dict[str, tuple[str, ...]])
step (int)

Return type:

None

write_sequence(sequence)[source]

write_sequence an array to file

Parameters:: sequence (list[str])
Return type:: None

class stable_baselines3.common.logger.Image(image, dataformats)[source]

Image data class storing an image and data format

Parameters:

image (Tensor | ndarray | str) – image to log
dataformats (str) – Image data format specification of the form NCHW, NHWC, CHW, HWC, HW, WH, etc. More info in add_image method doc at https://pytorch.org/docs/stable/tensorboard.html Gym envs normally use ‘HWC’ (channel last)

class stable_baselines3.common.logger.JSONOutputFormat(filename)[source]

Log to a file, in the JSON format

Parameters:: filename (str) – the file to write the log to

close()[source]

closes the file

Return type:: None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters:

key_values (dict[str, Any])
key_excluded (dict[str, tuple[str, ...]])
step (int)

Return type:

None

class stable_baselines3.common.logger.KVWriter[source]

Key Value writer

close()[source]

Close owned resources

Return type:: None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters:

key_values (dict[str, Any])
key_excluded (dict[str, tuple[str, ...]])
step (int)

Return type:

None

class stable_baselines3.common.logger.Logger(folder, output_formats)[source]

The logger class.

Parameters:

folder (str | None) – the logging location
output_formats (list[KVWriter]) – the list of output formats

close()[source]

closes the file

Return type:: None

debug(*args)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the DEBUG level.

Parameters:: args – log the arguments
Return type:: None

dump(step=0)[source]

Write all of the diagnostics from the current iteration

Parameters:: step (int)
Return type:: None

error(*args)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the ERROR level.

Parameters:: args – log the arguments
Return type:: None

get_dir()[source]

Get directory that log files are being written to. will be None if there is no output directory (i.e., if you didn’t call start)

Returns:: the logging directory
Return type:: str | None

info(*args)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the INFO level.

Parameters:: args – log the arguments
Return type:: None

log(*args, level=20)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file).

level: int. (see logger.py docs) If the global logger level is higher than: the level argument here, don’t print to stdout.

Parameters:

args – log the arguments
level (int) – the logging level (can be DEBUG=10, INFO=20, WARN=30, ERROR=40, DISABLED=50)

Return type:

None

record(key, value, exclude=None)[source]

Log a value of some diagnostic Call this once for each diagnostic quantity, each iteration If called many times, last value will be used.

Parameters:

key (str) – save to log this key
value (Any) – save to log this value
exclude (str | tuple[str, ...] | None) – outputs to be excluded

Return type:

None

record_mean(key, value, exclude=None)[source]

The same as record(), but if called many times, values averaged.

Parameters:

key (str) – save to log this key
value (float | None) – save to log this value
exclude (str | tuple[str, ...] | None) – outputs to be excluded

Return type:

None

set_level(level)[source]

Set logging threshold on current logger.

Parameters:: level (int) – the logging level (can be DEBUG=10, INFO=20, WARN=30, ERROR=40, DISABLED=50)
Return type:: None

static to_tuple(string_or_tuple)[source]

Helper function to convert str to tuple of str.

Parameters:: string_or_tuple (str | tuple[str, ...] | None)
Return type:: tuple[str, …]

warn(*args)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the WARN level.

Parameters:: args – log the arguments
Return type:: None

class stable_baselines3.common.logger.SeqWriter[source]

sequence writer

write_sequence(sequence)[source]

write_sequence an array to file

Parameters:: sequence (list[str])
Return type:: None

class stable_baselines3.common.logger.TensorBoardOutputFormat(folder)[source]

Dumps key/value pairs into TensorBoard’s numeric format.

Parameters:: folder (str) – the folder to write the log to

close()[source]

closes the file

Return type:: None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters:

key_values (dict[str, Any])
key_excluded (dict[str, tuple[str, ...]])
step (int)

Return type:

None

class stable_baselines3.common.logger.Video(frames, fps)[source]

Video data class storing the video frames and the frame per seconds

Parameters:

frames (Tensor) – frames to create the video from
fps (float) – frames per second

stable_baselines3.common.logger.configure(folder=None, format_strings=None)[source]

Configure the current logger.

Parameters:

folder (str | None) – the save location (if None, $SB3_LOGDIR, if still None, tempdir/SB3-[date & time])
format_strings (list[str] | None) – the output logging format (if None, $SB3_LOG_FORMAT, if still None, [‘stdout’, ‘log’, ‘csv’])

Returns:

The logger object.

Return type:

Logger

stable_baselines3.common.logger.filter_excluded_keys(key_values, key_excluded, _format)[source]

Filters the keys specified by key_exclude for the specified format

Parameters:

key_values (dict[str, Any]) – log dictionary to be filtered
key_excluded (dict[str, tuple[str, ...]]) – keys to be excluded per format
_format (str) – format for which this filter is run

Returns:

dict without the excluded keys

Return type:

dict[str, Any]

stable_baselines3.common.logger.make_output_format(_format, log_dir, log_suffix='')[source]

return a logger for the requested format

Parameters:

_format (str) – the requested format to log to (‘stdout’, ‘log’, ‘json’ or ‘csv’ or ‘tensorboard’)
log_dir (str) – the logging directory
log_suffix (str) – the suffix for the log file

Returns:

the logger

Return type:

KVWriter