$30
DAgger
rt is expensive, and thus we want to learn a policy that is almost as good as the expert without the high number of queries to it.
we have provided you with an environment that is hard to learn directly. Thankfully, we have access to an expert in this environment. In this homework, your task will be to utilize DAgger to learn a deep neural network policy that performs well on this task.
Environment
The environment we will use is built upon the Reacher environment from OpenAI gym (https://gym.openai.com/envs/Reacher-v2/). We have provided our environment in the reacher_env.py file in our code directory. It follows the OpenAI gym API, which you can learn more about at https://github.com/openai/gym#api. For this homework, an agent in this environment is considered successful if it can achieve a mean reward of at least 15.0.
k,we will attempt to learn this agent from image observations. Unfortunately, learning this agent directly from images without any priors is incredibly difficult, since images can be from a very high dimensional space. Thankfully, we have access to an expert prediction for any state the environment is currently on, which can be retrieved by the get_expert_action() function call. Note: get_expert_action() does not take any arguments, thus you must be careful to call it right after you have called .reset() or .step() on the environment to get the associated expert action.
Question 1
Download the code folder, with every file associated, from here https://drive.google.com/drive/folders/1T8B3gSNWjQU-JpifHkEDm9FfoxJA6wB_?usp=sharing Complete the code template provided in dagger_template.py, with the right code in every TODO section, to implement DAgger. A
Question 2
Create a plot with the number of expert queries on the X-axis, and the performance of the imitation model on the Y-axis. Elaborate if you see any clear trends here. (Hint: in the env, the variable expert_calls counts the number of expert queries.
Question 3
Could you potentially improve on the number of queries to the expert made by the DAgger algorithm? Think about when querying the expert may be redundant.
Try implementing your answer from question 3, and generate a query-vs-reward plot similar to question 2 for this implementation. Compare this plot with your answer from Q2. Is there a clear improvement?
Python environment installation instructions
Make sure you have conda installed in your system. Instructions link here.
Then, get the conda_env.yml file, and from the same directory, run conda env create -f conda_env.yml. If you don't have a GPU, you can remove the line saying - nvidia::cudatoolkit=11.1.
Activate the environment, conda activate hw1_dagger.
Then, install pybullet gym using the following instructions: https://github.com/benelot/pybullet-gym#installing-pybullet-gym
(New: alternately, just install pybullet-gym from here: https://github.com/shubhamjha97/pybullet-gym thanks Shubham!)
If you installed it from the official repo, go to the pybullet-gym directory, find this file: pybullet-gym\pybulletgym\envs\roboschool\envs\env_bases.py and change L29-L33 to the following:
self._cam_dist = 0.75
self._cam_yaw = 0
self._cam_pitch = -90
self._render_width = 320
self._render_height = 240
If you are still having trouble with training, up the image resize from (60, 80) to something higher.
Finally, run the code with python dagger_template.py once you have completed all the to-do steps in the code itself.