Atari Wrappers

class stable_baselines3.common.atari_wrappers.AtariWrapper(env, noop_max=30, frame_skip=4, screen_size=84, terminal_on_life_loss=True, clip_reward=True)[source]

Atari 2600 preprocessings

Specifically:

  • NoopReset: obtain initial state by taking random number of no-ops on reset.

  • Frame skipping: 4 by default

  • Max-pooling: most recent two observations

  • Termination signal when a life is lost.

  • Resize to a square image: 84x84 by default

  • Grayscale observation

  • Clip reward to {-1, 0, 1}

Parameters
  • env (Env) – gym environment

  • noop_max (int) – max number of no-ops

  • frame_skip (int) – the frequency at which the agent experiences the game.

  • screen_size (int) – resize Atari frame

  • terminal_on_life_loss (bool) – if True, then step() returns done=True whenever a life is lost.

  • clip_reward (bool) – If True (default), the reward is clip to {-1, 0, 1} depending on its sign.

class stable_baselines3.common.atari_wrappers.ClipRewardEnv(env)[source]

Clips the reward to {+1, 0, -1} by its sign.

Parameters

env (Env) – the environment

reward(reward)[source]

Bin reward to {+1, 0, -1} by its sign.

Parameters

reward (float) –

Return type

float

Returns

class stable_baselines3.common.atari_wrappers.EpisodicLifeEnv(env)[source]

Make end-of-life == end-of-episode, but only reset on true game over. Done by DeepMind for the DQN and co. since it helps value estimation.

Parameters

env (Env) – the environment to wrap

reset(**kwargs)[source]

Calls the Gym environment reset, only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

Parameters

kwargs – Extra keywords passed to env.reset() call

Return type

ndarray

Returns

the first observation of the environment

step(action)[source]

Steps through the environment with action.

Return type

Tuple[Union[Tuple, Dict[str, Any], ndarray, int], float, bool, Dict]

class stable_baselines3.common.atari_wrappers.FireResetEnv(env)[source]

Take action on reset for environments that are fixed until firing.

Parameters

env (Env) – the environment to wrap

reset(**kwargs)[source]

Resets the environment with kwargs.

Return type

ndarray

class stable_baselines3.common.atari_wrappers.MaxAndSkipEnv(env, skip=4)[source]

Return only every skip-th frame (frameskipping)

Parameters
  • env (Env) – the environment

  • skip (int) – number of skip-th frame

reset(**kwargs)[source]

Resets the environment with kwargs.

Return type

Union[Tuple, Dict[str, Any], ndarray, int]

step(action)[source]

Step the environment with the given action Repeat action, sum reward, and max over last observations.

Parameters

action (int) – the action

Return type

Tuple[Union[Tuple, Dict[str, Any], ndarray, int], float, bool, Dict]

Returns

observation, reward, done, information

class stable_baselines3.common.atari_wrappers.NoopResetEnv(env, noop_max=30)[source]

Sample initial states by taking random number of no-ops on reset. No-op is assumed to be action 0.

Parameters
  • env (Env) – the environment to wrap

  • noop_max (int) – the maximum value of no-ops to run

reset(**kwargs)[source]

Resets the environment with kwargs.

Return type

ndarray

class stable_baselines3.common.atari_wrappers.WarpFrame(env, width=84, height=84)[source]

Convert to grayscale and warp frames to 84x84 (default) as done in the Nature paper and later work.

Parameters
  • env (Env) – the environment

  • width (int) –

  • height (int) –

observation(frame)[source]

returns the current observation from a frame

Parameters

frame (ndarray) – environment frame

Return type

ndarray

Returns

the observation