Practice the Deep Reinforcement Learning (DRL) with the gymnasium.
- Easy hands-on on our laptop (like Mac/window/linux).
- No long-time training.
Check the Command Guide for the step-by-step commands:
- Create the conda env with pip.
- Exercise
- For a exercise, implement all
NotImplementedErrors in the*_exercise.pyfile . - then train it with the provided command.
- [Optional] generate the video and push the video/result to the HuggingFace.
- For a exercise, implement all
Don't choose too hard game and big neural network. But you can try it by yourself.
| Exercise | Algorithm | Verification Game | For Challenge | State | Action |
|---|---|---|---|---|---|
| 1. q_learning | Q Table | FrozenLake | Taxi | 📊 | 📊 |
| 2. dqn | Deep Q Network -> Rainbow | 1D LunarLander-v3 | img LunarLander-v3 | 🌊 | 📊 |
| 3. reinforce | Reinforce (Monte Carlo) | CartPole-v1 | - | 🌊 | 📊 |
| 4. curiosity | Curiosity (Reinforce, baseline, shaping reward) | - | MountainCar-v0 | 🌊 | 📊 |
| 5. A2C | A2C+GAE (or A2C+TD-n) | CartPole-v1 | LunarLander-v3 | 🌊 | 📊 |
| 6. A3C | A3C (using A2C+GAE) | CartPole-v1 | LunarLander-v3 | 🌊 | 📊 |
| 7. PPO | PPO | CartPole-v1 | LunarLander-v3 | 🌊 | 📊 |
| 8. TD3 | Twin Delayed DDPG (TD3) | Pendulum-v1 | Walker2d-v5 | 🌊 | 🌊 |
| 9. SAC | SAC (Soft Actor-Critic) | Pendulum-v1 | Walker2d-v5 | 🌊 | 🌊 |
| 10. PPO+DDP | PPO+Curiosity | Reacher-v5 | Pusher-v5 | 🌊 | 🌊 |
| 11. SAC+DDP | SAC+PER | Reacher-v5 | Pusher-v5 | 🌊 | 🌊 |
| 12. MBPO | Model-based Policy Optim. | Reacher-v5 | Walker2d-v5 | 🌊 | 🌊 |
where, 🌊: Continuous, 📊: Discrete
After studying the HuggingFace's DRL course and Pieter Abbeel's The Foundations of Deep RL in 6 Lectures, I want to have a deeper and broader understanding through the coding.
- RL Algorithms
- OpenAI's Spining Up
- Stable Baseline3