Getting StartedΒΆ

Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms.

Here is a quick example of how to train and run A2C on a CartPole environment:

import gym

from stable_baselines3 import A2C

env = gym.make('CartPole-v1')

model = A2C('MlpPolicy', env, verbose=1)

obs = env.reset()
for i in range(1000):
    action, _state = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    if done:
      obs = env.reset()


You can find explanations about the logger output and names in the Logger section.

Or just train a model with a one liner if the environment is registered in Gym and if the policy is registered:

from stable_baselines3 import A2C

model = A2C('MlpPolicy', 'CartPole-v1').learn(10000)