Both state and action space are continuous.
Skip if you want to use the given expert data.
- Install stable-baselines3
pip install stable-baselines3[extra]
- Generate data: data_gen_pendulum.ipynb
- predict action directly from state
- MSE loss
Using Scikit-learn : bc_pendulum_sklearn.ipynb
Using Pytorch : bc_pendulum_torch.ipynb
- predict mean and variance
- Negative log likelihood loss