Atari Wrappers¶
- class stable_baselines3.common.atari_wrappers.AtariWrapper(env, noop_max=30, frame_skip=4, screen_size=84, terminal_on_life_loss=True, clip_reward=True, action_repeat_probability=0.0)[source]¶
Atari 2600 preprocessings
Specifically:
Noop reset: obtain initial state by taking random number of no-ops on reset.
Frame skipping: 4 by default
Max-pooling: most recent two observations
Termination signal when a life is lost.
Resize to a square image: 84x84 by default
Grayscale observation
Clip reward to {-1, 0, 1}
Sticky actions: disabled by default
See https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/ for a visual explanation.
Warning
Use this wrapper only with Atari v4 without frame skip:
env_id = "*NoFrameskip-v4"
.- Parameters:
env (
Env
) – Environment to wrapnoop_max (
int
) – Max number of no-opsframe_skip (
int
) – Frequency at which the agent experiences the game. This correspond to repeating the actionframe_skip
times.screen_size (
int
) – Resize Atari frameterminal_on_life_loss (
bool
) – If True, then step() returns done=True whenever a life is lost.clip_reward (
bool
) – If True (default), the reward is clip to {-1, 0, 1} depending on its sign.action_repeat_probability (
float
) – Probability of repeating the last action
- class stable_baselines3.common.atari_wrappers.ClipRewardEnv(env)[source]¶
Clip the reward to {+1, 0, -1} by its sign.
- Parameters:
env (
Env
) – Environment to wrap
- class stable_baselines3.common.atari_wrappers.EpisodicLifeEnv(env)[source]¶
Make end-of-life == end-of-episode, but only reset on true game over. Done by DeepMind for the DQN and co. since it helps value estimation.
- Parameters:
env (
Env
) – Environment to wrap
- reset(**kwargs)[source]¶
Calls the Gym environment reset, only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.
- Parameters:
kwargs – Extra keywords passed to env.reset() call
- Return type:
ndarray
- Returns:
the first observation of the environment
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Return type:
Tuple
[Union
[Tuple
,Dict
[str
,Any
],ndarray
,int
],float
,bool
,Dict
]
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- class stable_baselines3.common.atari_wrappers.FireResetEnv(env)[source]¶
Take action on reset for environments that are fixed until firing.
- Parameters:
env (
Env
) – Environment to wrap
- reset(**kwargs)[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Return type:
ndarray
- Returns:
observation (object): the initial observation.
- class stable_baselines3.common.atari_wrappers.MaxAndSkipEnv(env, skip=4)[source]¶
Return only every
skip
-th frame (frameskipping) and return the max between the two last frames.- Parameters:
env (
Env
) – Environment to wrapskip (
int
) – Number ofskip
-th frame The same action will be takenskip
times.
- class stable_baselines3.common.atari_wrappers.NoopResetEnv(env, noop_max=30)[source]¶
Sample initial states by taking random number of no-ops on reset. No-op is assumed to be action 0.
- Parameters:
env (
Env
) – Environment to wrapnoop_max (
int
) – Maximum value of no-ops to run
- reset(**kwargs)[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Return type:
ndarray
- Returns:
observation (object): the initial observation.
- class stable_baselines3.common.atari_wrappers.StickyActionEnv(env, action_repeat_probability)[source]¶
Sticky action.
Paper: https://arxiv.org/abs/1709.06009 Official implementation: https://github.com/mgbellemare/Arcade-Learning-Environment
- Parameters:
env (
Env
) – Environment to wrapaction_repeat_probability (
float
) – Probability of repeating the last action
- reset(**kwargs)[source]¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Return type:
Union
[Tuple
,Dict
[str
,Any
],ndarray
,int
]
- Returns:
observation (object): the initial observation.
- step(action)[source]¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Return type:
Tuple
[Union
[Tuple
,Dict
[str
,Any
],ndarray
,int
],float
,bool
,Dict
]
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)