Atari Wrappers¶

class stable_baselines3.common.atari_wrappers.AtariWrapper(env: gym.core.Env, noop_max: int = 30, frame_skip: int = 4, screen_size: int = 84, terminal_on_life_loss: bool = True, clip_reward: bool = True)[source]¶

Atari 2600 preprocessings

Specifically:

NoopReset: obtain initial state by taking random number of no-ops on reset.
Frame skipping: 4 by default
Max-pooling: most recent two observations
Termination signal when a life is lost.
Resize to a square image: 84x84 by default
Grayscale observation
Clip reward to {-1, 0, 1}

Parameters

env – (gym.Env) gym environment
noop_max – (int): max number of no-ops
frame_skip – (int): the frequency at which the agent experiences the game.
screen_size – (int): resize Atari frame
terminal_on_life_loss – (bool): if True, then step() returns done=True whenever a life is lost.
clip_reward – (bool) If True (default), the reward is clip to {-1, 0, 1} depending on its sign.

class stable_baselines3.common.atari_wrappers.ClipRewardEnv(env: gym.core.Env)[source]¶

reward(reward: float) → float[source]¶

Bin reward to {+1, 0, -1} by its sign.

Parameters: reward – (float)
Returns: (float)

class stable_baselines3.common.atari_wrappers.EpisodicLifeEnv(env: gym.core.Env)[source]¶

reset(**kwargs) → numpy.ndarray[source]¶

Calls the Gym environment reset, only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

Parameters: kwargs – Extra keywords passed to env.reset() call
Returns: (np.ndarray) the first observation of the environment

step(action: int) → Tuple[Union[Tuple, Dict[str, Any], numpy.ndarray, int], float, bool, Dict][source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class stable_baselines3.common.atari_wrappers.FireResetEnv(env: gym.core.Env)[source]¶

reset(**kwargs) → numpy.ndarray[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

class stable_baselines3.common.atari_wrappers.MaxAndSkipEnv(env: gym.core.Env, skip: int = 4)[source]¶

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

step(action: int) → Tuple[Union[Tuple, Dict[str, Any], numpy.ndarray, int], float, bool, Dict][source]¶

Step the environment with the given action Repeat action, sum reward, and max over last observations.

Parameters: action – ([int] or [float]) the action
Returns: ([int] or [float], [float], [bool], dict) observation, reward, done, information

class stable_baselines3.common.atari_wrappers.NoopResetEnv(env: gym.core.Env, noop_max: int = 30)[source]¶

reset(**kwargs) → numpy.ndarray[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

class stable_baselines3.common.atari_wrappers.WarpFrame(env: gym.core.Env, width: int = 84, height: int = 84)[source]¶

observation(frame: numpy.ndarray) → numpy.ndarray[source]¶

returns the current observation from a frame

Parameters: frame – (np.ndarray) environment frame
Returns: (np.ndarray) the observation