Logger
To overwrite the default logger, you can pass one to the algorithm.
Available formats are ["stdout", "csv", "log", "tensorboard", "json"]
.
Warning
When passing a custom logger object,
this will overwrite tensorboard_log
and verbose
settings
passed to the constructor.
from stable_baselines3 import A2C
from stable_baselines3.common.logger import configure
tmp_path = "/tmp/sb3_log/"
# set up logger
new_logger = configure(tmp_path, ["stdout", "csv", "tensorboard"])
model = A2C("MlpPolicy", "CartPole-v1", verbose=1)
# Set new logger
model.set_logger(new_logger)
model.learn(10000)
Explanation of logger output
You can find below short explanations of the values logged in Stable-Baselines3 (SB3). Depending on the algorithm used and of the wrappers/callbacks applied, SB3 only logs a subset of those keys during training.
Below you can find an example of the logger output when training a PPO agent:
-----------------------------------------
| eval/ | |
| mean_ep_length | 200 |
| mean_reward | -157 |
| rollout/ | |
| ep_len_mean | 200 |
| ep_rew_mean | -227 |
| time/ | |
| fps | 972 |
| iterations | 19 |
| time_elapsed | 80 |
| total_timesteps | 77824 |
| train/ | |
| approx_kl | 0.037781604 |
| clip_fraction | 0.243 |
| clip_range | 0.2 |
| entropy_loss | -1.06 |
| explained_variance | 0.999 |
| learning_rate | 0.001 |
| loss | 0.245 |
| n_updates | 180 |
| policy_gradient_loss | -0.00398 |
| std | 0.205 |
| value_loss | 0.226 |
-----------------------------------------
eval/
All eval/
values are computed by the EvalCallback
.
mean_ep_length
: Mean episode lengthmean_reward
: Mean episodic reward (during evaluation)success_rate
: Mean success rate during evaluation (1.0 means 100% success), the environment info dict must contain anis_success
key to compute that value
rollout/
ep_len_mean
: Mean episode length (averaged overstats_window_size
episodes, 100 by default)ep_rew_mean
: Mean episodic training reward (averaged overstats_window_size
episodes, 100 by default), aMonitor
wrapper is required to compute that value (automatically added by make_vec_env).exploration_rate
: Current value of the exploration rate when using DQN, it corresponds to the fraction of actions taken randomly (epsilon of the “epsilon-greedy” exploration)success_rate
: Mean success rate during training (averaged overstats_window_size
episodes, 100 by default), you must pass an extra argument to theMonitor
wrapper to log that value (info_keywords=("is_success",)
) and provideinfo["is_success"]=True/False
on the final step of the episode
time/
episodes
: Total number of episodesfps
: Number of frames per seconds (includes time taken by gradient update)iterations
: Number of iterations (data collection + policy update for A2C/PPO)time_elapsed
: Time in seconds since the beginning of trainingtotal_timesteps
: Total number of timesteps (steps in the environments)
train/
actor_loss
: Current value for the actor loss for off-policy algorithmsapprox_kl
: approximate mean KL divergence between old and new policy (for PPO), it is an estimation of how much changes happened in the updateclip_fraction
: mean fraction of surrogate loss that was clipped (aboveclip_range
threshold) for PPO.clip_range
: Current value of the clipping factor for the surrogate loss of PPOcritic_loss
: Current value for the critic function loss for off-policy algorithms, usually error between value function output and TD(0), temporal difference estimateent_coef
: Current value of the entropy coefficient (when using SAC)ent_coef_loss
: Current value of the entropy coefficient loss (when using SAC)entropy_loss
: Mean value of the entropy loss (negative of the average policy entropy)explained_variance
: Fraction of the return variance explained by the value function, see https://scikit-learn.org/stable/modules/model_evaluation.html#explained-variance-score (ev=0 => might as well have predicted zero, ev=1 => perfect prediction, ev<0 => worse than just predicting zero)learning_rate
: Current learning rate valueloss
: Current total loss valuen_updates
: Number of gradient updates applied so farpolicy_gradient_loss
: Current value of the policy gradient loss (its value does not have much meaning)value_loss
: Current value for the value function loss for on-policy algorithms, usually error between value function output and Monte-Carlo estimate (or TD(lambda) estimate)std
: Current standard deviation of the noise when using generalized State-Dependent Exploration (gSDE)
- class stable_baselines3.common.logger.CSVOutputFormat(filename)[source]
Log to a file, in a CSV format
- Parameters:
filename (
str
) – the file to write the log to
- class stable_baselines3.common.logger.Figure(figure, close)[source]
Figure data class storing a matplotlib figure and whether to close the figure after logging it
- Parameters:
figure (
Figure
) – figure to logclose (
bool
) – if true, close the figure after logging it
- exception stable_baselines3.common.logger.FormatUnsupportedError(unsupported_formats, value_description)[source]
Custom error to display informative message when a value is not supported by some formats.
- Parameters:
unsupported_formats (
Sequence
[str
]) – A sequence of unsupported formats, for instance["stdout"]
.value_description (
str
) – Description of the value that cannot be logged by this format.
- class stable_baselines3.common.logger.HParam(hparam_dict, metric_dict)[source]
Hyperparameter data class storing hyperparameters and metrics in dictionaries
- Parameters:
hparam_dict (
Mapping
[str
,Union
[bool
,str
,float
,None
]]) – key-value pairs of hyperparameters to logmetric_dict (
Mapping
[str
,float
]) – key-value pairs of metrics to log A non-empty metrics dict is required to display hyperparameters in the corresponding Tensorboard section.
- class stable_baselines3.common.logger.HumanOutputFormat(filename_or_file, max_length=36)[source]
A human-readable output format producing ASCII tables of key-value pairs.
Set attribute
max_length
to change the maximum length of keys and values to write to output (or specify it when calling__init__
).- Parameters:
filename_or_file (
Union
[str
,TextIO
]) – the file to write the log tomax_length (
int
) – the maximum length of keys and values to write to output. Outputs longer than this will be truncated. An error will be raised if multiple keys are truncated to the same value. The maximum output width will be2*max_length + 7
. The default of 36 produces output no longer than 79 characters wide.
- class stable_baselines3.common.logger.Image(image, dataformats)[source]
Image data class storing an image and data format
- Parameters:
image (
Union
[Tensor
,ndarray
,str
]) – image to logdataformats (
str
) – Image data format specification of the form NCHW, NHWC, CHW, HWC, HW, WH, etc. More info in add_image method doc at https://pytorch.org/docs/stable/tensorboard.html Gym envs normally use ‘HWC’ (channel last)
- class stable_baselines3.common.logger.JSONOutputFormat(filename)[source]
Log to a file, in the JSON format
- Parameters:
filename (
str
) – the file to write the log to
- class stable_baselines3.common.logger.Logger(folder, output_formats)[source]
The logger class.
- Parameters:
folder (
Optional
[str
]) – the logging locationoutput_formats (
List
[KVWriter
]) – the list of output formats
- debug(*args)[source]
Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the DEBUG level.
- Parameters:
args – log the arguments
- Return type:
None
- error(*args)[source]
Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the ERROR level.
- Parameters:
args – log the arguments
- Return type:
None
- get_dir()[source]
Get directory that log files are being written to. will be None if there is no output directory (i.e., if you didn’t call start)
- Return type:
Optional
[str
]- Returns:
the logging directory
- info(*args)[source]
Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file). Using the INFO level.
- Parameters:
args – log the arguments
- Return type:
None
- log(*args, level=20)[source]
Write the sequence of args, with no separators, to the console and output files (if you’ve configured an output file).
- level: int. (see logger.py docs) If the global logger level is higher than
the level argument here, don’t print to stdout.
- Parameters:
args – log the arguments
level (
int
) – the logging level (can be DEBUG=10, INFO=20, WARN=30, ERROR=40, DISABLED=50)
- Return type:
None
- record(key, value, exclude=None)[source]
Log a value of some diagnostic Call this once for each diagnostic quantity, each iteration If called many times, last value will be used.
- Parameters:
key (
str
) – save to log this keyvalue (
Any
) – save to log this valueexclude (
Union
[str
,Tuple
[str
,...
],None
]) – outputs to be excluded
- Return type:
None
- record_mean(key, value, exclude=None)[source]
The same as record(), but if called many times, values averaged.
- Parameters:
key (
str
) – save to log this keyvalue (
Optional
[float
]) – save to log this valueexclude (
Union
[str
,Tuple
[str
,...
],None
]) – outputs to be excluded
- Return type:
None
- set_level(level)[source]
Set logging threshold on current logger.
- Parameters:
level (
int
) – the logging level (can be DEBUG=10, INFO=20, WARN=30, ERROR=40, DISABLED=50)- Return type:
None
- class stable_baselines3.common.logger.TensorBoardOutputFormat(folder)[source]
Dumps key/value pairs into TensorBoard’s numeric format.
- Parameters:
folder (
str
) – the folder to write the log to
- class stable_baselines3.common.logger.Video(frames, fps)[source]
Video data class storing the video frames and the frame per seconds
- Parameters:
frames (
Tensor
) – frames to create the video fromfps (
float
) – frames per second
- stable_baselines3.common.logger.configure(folder=None, format_strings=None)[source]
Configure the current logger.
- Parameters:
folder (
Optional
[str
]) – the save location (if None, $SB3_LOGDIR, if still None, tempdir/SB3-[date & time])format_strings (
Optional
[List
[str
]]) – the output logging format (if None, $SB3_LOG_FORMAT, if still None, [‘stdout’, ‘log’, ‘csv’])
- Return type:
- Returns:
The logger object.
- stable_baselines3.common.logger.filter_excluded_keys(key_values, key_excluded, _format)[source]
Filters the keys specified by
key_exclude
for the specified format- Parameters:
key_values (
Dict
[str
,Any
]) – log dictionary to be filteredkey_excluded (
Dict
[str
,Tuple
[str
,...
]]) – keys to be excluded per format_format (
str
) – format for which this filter is run
- Return type:
Dict
[str
,Any
]- Returns:
dict without the excluded keys
- stable_baselines3.common.logger.make_output_format(_format, log_dir, log_suffix='')[source]
return a logger for the requested format
- Parameters:
_format (
str
) – the requested format to log to (‘stdout’, ‘log’, ‘json’ or ‘csv’ or ‘tensorboard’)log_dir (
str
) – the logging directorylog_suffix (
str
) – the suffix for the log file
- Return type:
- Returns:
the logger