$30
Deep Q-learning
This method is quite dated now, but for a lot of algorithms used today, the roots can be traced back to DQN.
One of the algorithms improved on DQN is Rainbow (https://arxiv.org/abs/1710.02298v1), which combined some of its contemporary improvements over DQN into one algorithm, like dueling DQN, double DQN, and Prioritized Experience Replay. Finally, more recently, Data-regularized Q-learning (DrQ) has improved on this baseline by adding image augmentations to DQN.
we provide you an example implementation of DQN. We ask you to add some of those improvements made in Rainbow and DrQ to get to an almost state-of-the-art deep RL algorithm.
Environment
The environment we will use in this homework is built upon the Pong, Space Invaders, and Breakout environment from OpenAI gym Atari environments (https://gym.openai.com/envs/#atari). In this homework, we will attempt to learn these agents from image observations. We already have a working implementation of DQN on the code folder, which you can run as python train.py env=Breakout and so on. Your job is to complete all the TODOs, and turn on the completed features one by one. You can download the code folder, with every file associated, from here https://drive.google.com/drive/folders/1T8B3gSNWjQU-JpifHkEDm9FfoxJA6wB_?usp=sharing
Question 1
Download the code folder, and run the code for the three given environments. Make a plot of their performance over time. This is your baseline, and you will compare your future improvements to the code with this baseline to test their validity.
Question 2
First, add Double Q-learning onto the model (find the place by searching “TODO: double Q learning” in the code files.) Make another plot by running the three environments on your code that use double Q learning.
Question 3
Next, implement Prioritized Experience Replay. In the replay_buffer.py file, we already have an implementation of a prioritized replay buffer. Use that in your code. Find “TODO prioritized replay buffer”, and fix the priority update for the prioritized replay buffer. Make another set of plots, and compare them to the plots from Q2. Is your performance better or worse now? Try explaining your observations.
Question 4
Finally, implement Dueling DQN. Find the places where you have to put your code by searching for “TODO dueling DQN”. As before, plot your performance in the three environments. Now that you have an almost complete implementation of Rainbow, combine all of your plots together and show the improvement over vanilla DQN.
Question 5
: We still haven’t used DrQ anywhere. Read the code to try and figure out how to use DrQ on top of DQN. You can read more in the blog post by the authors of the original DrQ paper here: https://sites.google.com/view/data-regularized-q. You will have to:
● Figure out what the best data augmentations are for the environment you have,
● Add those augmentations into the training process, and
● Report (improved) results from using those augmentations.
Python environment installation instructions
Make sure you have conda installed in your system. Instructions link here.
Then, get the conda_env.yml file, and from the same directory, run conda env create -f conda_env.yml.
Finally, run the code with python train.py once you have completed some of the to-do steps in the code itself.