Starting from:

$30

CSCI-GA3033-090-Homework 3 Solved

Questions
In the code folder, you will find already available code for running REINFORCE. Run this code on the following environments: Pendulum-v0, BipedalWalker-v3, and LunarLanderContinuous-v2. It is okay if REINFORCE does not perform as well in these environments. Generate the plot over training times for these 3 environments over three different seeds, and create three plots that show the average performance of REINFORCE on each environment. Why do you think REINFORCE suffers in these environments?
 

Now, complete the PPO code found in ppo/ppo.py. You will find a few different TODOs for you. Follow the original PPO pseudocode if you need to. Once again, use the previous three environments and three different seeds to plot your training rewards. Clearly show the comparison between REINFORCE and PPO in your plots.

More products