Atari Wrappers

class stable_baselines3.common.atari_wrappers.AtariWrapper(env, noop_max=30, frame_skip=4, screen_size=84, terminal_on_life_loss=True, clip_reward=True, action_repeat_probability=0.0)[source]

Atari 2600 preprocessings

Specifically:

Noop reset: obtain initial state by taking random number of no-ops on reset.
Frame skipping: 4 by default
Max-pooling: most recent two observations
Termination signal when a life is lost.
Resize to a square image: 84x84 by default
Grayscale observation
Clip reward to {-1, 0, 1}
Sticky actions: disabled by default

See https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/ for a visual explanation.

Warning

Use this wrapper only with Atari v4 without frame skip: env_id = "*NoFrameskip-v4".

Parameters:

env (Env) – Environment to wrap
noop_max (int) – Max number of no-ops
frame_skip (int) – Frequency at which the agent experiences the game. This correspond to repeating the action frame_skip times.
screen_size (int) – Resize Atari frame
terminal_on_life_loss (bool) – If True, then step() returns terminated=True whenever a life is lost.
clip_reward (bool) – If True (default), the reward is clip to {-1, 0, 1} depending on its sign.
action_repeat_probability (float) – Probability of repeating the last action

class stable_baselines3.common.atari_wrappers.ClipRewardEnv(env)[source]

Clip the reward to {+1, 0, -1} by its sign.

Parameters:: env (Env) – Environment to wrap

reward(reward)[source]

Bin reward to {+1, 0, -1} by its sign.

Parameters:: reward (SupportsFloat)
Returns:
Return type:: float

class stable_baselines3.common.atari_wrappers.EpisodicLifeEnv(env)[source]

Make end-of-life == end-of-episode, but only reset on true game over. Done by DeepMind for the DQN and co. since it helps value estimation.

Note

This wrapper changes the behavior of env.reset(). When the environment terminates due to a loss of life (but not game over), calling reset() will perform a no-op step instead of truly resetting the environment. This can be confusing when evaluating or testing agents. To avoid this behavior and ensure reset() always resets to the env, set terminal_on_life_loss=False when using make_atari_env().

Parameters:: env (Env) – Environment to wrap

reset(**kwargs)[source]

Calls the Gym environment reset, only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

Parameters:: kwargs – Extra keywords passed to env.reset() call
Returns:: the first observation of the environment
Return type:: tuple[ndarray, dict[str, Any]]

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

Parameters:: action (int)
Return type:: tuple[ndarray, SupportsFloat, bool, bool, dict[str, Any]]

class stable_baselines3.common.atari_wrappers.FireResetEnv(env)[source]

Take action on reset for environments that are fixed until firing.

Parameters:: env (Env) – Environment to wrap

reset(**kwargs)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Return type:: tuple[ndarray, dict[str, Any]]

class stable_baselines3.common.atari_wrappers.MaxAndSkipEnv(env, skip=4)[source]

Return only every skip-th frame (frameskipping) and return the max between the two last frames.

Parameters:

env (Env) – Environment to wrap
skip (int) – Number of skip-th frame The same action will be taken skip times.

step(action)[source]

Step the environment with the given action Repeat action, sum reward, and max over last observations.

Parameters:: action (int) – the action
Returns:: observation, reward, terminated, truncated, information
Return type:: tuple[ndarray, SupportsFloat, bool, bool, dict[str, Any]]

class stable_baselines3.common.atari_wrappers.NoopResetEnv(env, noop_max=30)[source]

Sample initial states by taking random number of no-ops on reset. No-op is assumed to be action 0.

Parameters:

env (Env) – Environment to wrap
noop_max (int) – Maximum value of no-ops to run

reset(**kwargs)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Return type:: tuple[ndarray, dict[str, Any]]

class stable_baselines3.common.atari_wrappers.StickyActionEnv(env, action_repeat_probability)[source]

Sticky action.

Paper: https://arxiv.org/abs/1709.06009 Official implementation: https://github.com/mgbellemare/Arcade-Learning-Environment

Parameters:

env (Env) – Environment to wrap
action_repeat_probability (float) – Probability of repeating the last action

reset(**kwargs)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Return type:: tuple[ndarray, dict[str, Any]]

step(action)[source]

Uses the step() of the env that can be overwritten to change the returned data.

Parameters:: action (int)
Return type:: tuple[ndarray, SupportsFloat, bool, bool, dict[str, Any]]

class stable_baselines3.common.atari_wrappers.WarpFrame(env, width=84, height=84)[source]

Convert to grayscale and warp frames to 84x84 (default) as done in the Nature paper and later work.

Parameters:

env (Env) – Environment to wrap
width (int) – New frame width
height (int) – New frame height

observation(frame)[source]

returns the current observation from a frame

Parameters:: frame (ndarray) – environment frame
Returns:: the observation
Return type:: ndarray