Logger

To overwrite the default logger, you can pass one to the algorithm. Available formats are ["stdout", "csv", "log", "tensorboard", "json"].

Warning

When passing a custom logger object, this will overwrite tensorboard_log and verbose settings passed to the constructor.

from stable_baselines3 import A2C
from stable_baselines3.common.logger import configure

tmp_path = "/tmp/sb3_log/"
# set up logger
new_logger = configure(tmp_path, ["stdout", "csv", "tensorboard"])

model = A2C("MlpPolicy", "CartPole-v1", verbose=1)
# Set new logger
model.set_logger(new_logger)
model.learn(10000)

Explanation of logger output

You can find below short explanations of the values logged in Stable-Baselines3 (SB3). Depending on the algorithm used and of the wrappers/callbacks applied, SB3 only logs a subset of those keys during training.

Below you can find an example of the logger output when training a PPO agent:

-----------------------------------------
| eval/                   |             |
|    mean_ep_length       | 200         |
|    mean_reward          | -157        |
| rollout/                |             |
|    ep_len_mean          | 200         |
|    ep_rew_mean          | -227        |
| time/                   |             |
|    fps                  | 972         |
|    iterations           | 19          |
|    time_elapsed         | 80          |
|    total_timesteps      | 77824       |
| train/                  |             |
|    approx_kl            | 0.037781604 |
|    clip_fraction        | 0.243       |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.06       |
|    explained_variance   | 0.999       |
|    learning_rate        | 0.001       |
|    loss                 | 0.245       |
|    n_updates            | 180         |
|    policy_gradient_loss | -0.00398    |
|    std                  | 0.205       |
|    value_loss           | 0.226       |
-----------------------------------------

eval/

All eval/ values are computed by the EvalCallback.

  • mean_ep_length: Mean episode length

  • mean_reward: Mean episodic reward (during evaluation)

  • success_rate: Mean success rate during evaluation (1.0 means 100% success), the environment info dict must contain an is_success key to compute that value

rollout/

  • ep_len_mean: Mean episode length (averaged over 100 episodes)

  • ep_rew_mean: Mean episodic training reward (averaged over 100 episodes), a Monitor wrapper is required to compute that value (automatically added by make_vec_env).

  • exploration_rate: Current value of the exploration rate when using DQN, it corresponds to the fraction of actions taken randomly (epsilon of the “epsilon-greedy” exploration)

  • success_rate: Mean success rate during training (averaged over 100 episodes), you must pass an extra argument to the Monitor wrapper to log that value (info_keywords=("is_success",)) and provide info["is_success"]=True/False on the final step of the episode

time/

  • episodes: Total number of episodes

  • fps: Number of frames per seconds (includes time taken by gradient update)

  • iterations: Number of iterations (data collection + policy update for A2C/PPO)

  • time_elapsed: Time in seconds since the beginning of training

  • total_timesteps: Total number of timesteps (steps in the environments)

train/

  • actor_loss: Current value for the actor loss for off-policy algorithms

  • approx_kl: approximate mean KL divergence between old and new policy (for PPO), it is an estimation of how much changes happened in the update

  • clip_fraction: mean fraction of surrogate loss that was clipped (above clip_range threshold) for PPO.

  • clip_range: Current value of the clipping factor for the surrogate loss of PPO

  • critic_loss: Current value for the critic function loss for off-policy algorithms, usually error between value function output and TD(0), temporal difference estimate

  • ent_coef: Current value of the entropy coefficient (when using SAC)

  • ent_coef_loss: Current value of the entropy coefficient loss (when using SAC)

  • entropy_loss: Mean value of the entropy loss (negative of the average policy entropy)

  • explained_variance: Fraction of the return variance explained by the value function, see https://scikit-learn.org/stable/modules/model_evaluation.html#explained-variance-score (ev=0 => might as well have predicted zero, ev=1 => perfect prediction, ev<0 => worse than just predicting zero)

  • learning_rate: Current learning rate value

  • loss: Current total loss value

  • n_updates: Number of gradient updates applied so far

  • policy_gradient_loss: Current value of the policy gradient loss (its value does not have much meaning)

  • value_loss: Current value for the value function loss for on-policy algorithms, usually error between value function output and Monte-Carle estimate (or TD(lambda) estimate)

  • std: Current standard deviation of the noise when using generalized State-Dependent Exploration (gSDE)

class stable_baselines3.common.logger.CSVOutputFormat(filename)[source]

Log to a file, in a CSV format

Parameters

filename (str) – the file to write the log to

close()[source]

closes the file

Return type

None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters
  • key_values (Dict[str, Any]) –

  • key_excluded (Dict[str, Union[str, Tuple[str, ...]]]) –

  • step (int) –

Return type

None

class stable_baselines3.common.logger.Figure(figure, close)[source]

Figure data class storing a matplotlib figure and whether to close the figure after logging it

Parameters
  • figure (figure) – figure to log

  • close (bool) – if true, close the figure after logging it

exception stable_baselines3.common.logger.FormatUnsupportedError(unsupported_formats, value_description)[source]

Custom error to display informative message when a value is not supported by some formats.

Parameters
  • unsupported_formats (Sequence[str]) – A sequence of unsupported formats, for instance ["stdout"].

  • value_description (str) – Description of the value that cannot be logged by this format.

class stable_baselines3.common.logger.HumanOutputFormat(filename_or_file, max_length=36)[source]

A human-readable output format producing ASCII tables of key-value pairs.

Set attribute max_length to change the maximum length of keys and values to write to output (or specify it when calling __init__).

Parameters
  • filename_or_file (Union[str, TextIO]) – the file to write the log to

  • max_length (int) – the maximum length of keys and values to write to output. Outputs longer than this will be truncated. An error will be raised if multiple keys are truncated to the same value. The maximum output width will be 2*max_length + 7. The default of 36 produces output no longer than 79 characters wide.

close()[source]

closes the file

Return type

None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters
  • key_values (Dict) –

  • key_excluded (Dict) –

  • step (int) –

Return type

None

write_sequence(sequence)[source]

write_sequence an array to file

Parameters

sequence (List) –

Return type

None

class stable_baselines3.common.logger.Image(image, dataformats)[source]

Image data class storing an image and data format

Parameters
  • image (Union[Tensor, ndarray, str]) – image to log

  • dataformats (str) – Image data format specification of the form NCHW, NHWC, CHW, HWC, HW, WH, etc. More info in add_image method doc at https://pytorch.org/docs/stable/tensorboard.html Gym envs normally use ‘HWC’ (channel last)

class stable_baselines3.common.logger.JSONOutputFormat(filename)[source]

Log to a file, in the JSON format

Parameters

filename (str) – the file to write the log to

close()[source]

closes the file

Return type

None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters
  • key_values (Dict[str, Any]) –

  • key_excluded (Dict[str, Union[str, Tuple[str, ...]]]) –

  • step (int) –

Return type

None

class stable_baselines3.common.logger.KVWriter[source]

Key Value writer

close()[source]

Close owned resources

Return type

None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters
  • key_values (Dict[str, Any]) –

  • key_excluded (Dict[str, Union[str, Tuple[str, ...]]]) –

  • step (int) –

Return type

None

class stable_baselines3.common.logger.Logger(folder, output_formats)[source]

The logger class.

Parameters
  • folder (Optional[str]) – the logging location

  • output_formats (List[KVWriter]) – the list of output formats

close()[source]

closes the file

Return type

None

debug(*args)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the DEBUG level.

Parameters

args – log the arguments

Return type

None

dump(step=0)[source]

Write all of the diagnostics from the current iteration

Return type

None

error(*args)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the ERROR level.

Parameters

args – log the arguments

Return type

None

get_dir()[source]

Get directory that log files are being written to. will be None if there is no output directory (i.e., if you didn’t call start)

Return type

str

Returns

the logging directory

info(*args)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the INFO level.

Parameters

args – log the arguments

Return type

None

log(*args, level=20)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file).

level: int. (see logger.py docs) If the global logger level is higher than

the level argument here, don’t print to stdout.

Parameters
  • args – log the arguments

  • level (int) – the logging level (can be DEBUG=10, INFO=20, WARN=30, ERROR=40, DISABLED=50)

Return type

None

record(key, value, exclude=None)[source]

Log a value of some diagnostic Call this once for each diagnostic quantity, each iteration If called many times, last value will be used.

Parameters
  • key (str) – save to log this key

  • value (Any) – save to log this value

  • exclude (Union[str, Tuple[str, ...], None]) – outputs to be excluded

Return type

None

record_mean(key, value, exclude=None)[source]

The same as record(), but if called many times, values averaged.

Parameters
  • key (str) – save to log this key

  • value (Any) – save to log this value

  • exclude (Union[str, Tuple[str, ...], None]) – outputs to be excluded

Return type

None

set_level(level)[source]

Set logging threshold on current logger.

Parameters

level (int) – the logging level (can be DEBUG=10, INFO=20, WARN=30, ERROR=40, DISABLED=50)

Return type

None

warn(*args)[source]

Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the WARN level.

Parameters

args – log the arguments

Return type

None

class stable_baselines3.common.logger.SeqWriter[source]

sequence writer

write_sequence(sequence)[source]

write_sequence an array to file

Parameters

sequence (List) –

Return type

None

class stable_baselines3.common.logger.TensorBoardOutputFormat(folder)[source]

Dumps key/value pairs into TensorBoard’s numeric format.

Parameters

folder (str) – the folder to write the log to

close()[source]

closes the file

Return type

None

write(key_values, key_excluded, step=0)[source]

Write a dictionary to file

Parameters
  • key_values (Dict[str, Any]) –

  • key_excluded (Dict[str, Union[str, Tuple[str, ...]]]) –

  • step (int) –

Return type

None

class stable_baselines3.common.logger.Video(frames, fps)[source]

Video data class storing the video frames and the frame per seconds

Parameters
  • frames (Tensor) – frames to create the video from

  • fps (Union[float, int]) – frames per second

stable_baselines3.common.logger.configure(folder=None, format_strings=None)[source]

Configure the current logger.

Parameters
  • folder (Optional[str]) – the save location (if None, $SB3_LOGDIR, if still None, tempdir/SB3-[date & time])

  • format_strings (Optional[List[str]]) – the output logging format (if None, $SB3_LOG_FORMAT, if still None, [‘stdout’, ‘log’, ‘csv’])

Return type

Logger

Returns

The logger object.

stable_baselines3.common.logger.filter_excluded_keys(key_values, key_excluded, _format)[source]

Filters the keys specified by key_exclude for the specified format

Parameters
  • key_values (Dict[str, Any]) – log dictionary to be filtered

  • key_excluded (Dict[str, Union[str, Tuple[str, ...]]]) – keys to be excluded per format

  • _format (str) – format for which this filter is run

Return type

Dict[str, Any]

Returns

dict without the excluded keys

stable_baselines3.common.logger.make_output_format(_format, log_dir, log_suffix='')[source]

return a logger for the requested format

Parameters
  • _format (str) – the requested format to log to (‘stdout’, ‘log’, ‘json’ or ‘csv’ or ‘tensorboard’)

  • log_dir (str) – the logging directory

  • log_suffix (str) – the suffix for the log file

Return type

KVWriter

Returns

the logger

stable_baselines3.common.logger.read_csv(filename)[source]

read a csv file using pandas

Parameters

filename (str) – the file path to read

Return type

DataFrame

Returns

the data in the csv

stable_baselines3.common.logger.read_json(filename)[source]

read a json file using pandas

Parameters

filename (str) – the file path to read

Return type

DataFrame

Returns

the data in the json