$25
Problem 1: Model-based Reinforcement Learning with PETS (80 pts)
1.1 Planning with Cross Entropy Method (CEM) [20 pts]
1.1.3 MPC implementation; Comparison of MPC in environment w/o noise (10 pts)
1.2 Probabilistic Ensemble and Trajectory Sampling (PETS) [60 pts]
1.2.1 Derive the Loss of a Probabilistic Model (5 pts)
1.2.2 Loss and RMSE of a single dynamic model (5 pts)
1.2.3 Planning on single model with random actions + MPC (5 pts)
1.2.4 Planning on single model with CEM+MPC (10 pts)
1.2.5 Discussion on the comparison between CEM and random actions (5 pts)
1.2.6 Description of the implementation details (5 pts)
1.2.7 Loss and RMSE of the probabilistic ensemble model (10 pts)
1.2.8 Success percentage of CEM+MPC and random actions + MPC (10 pts)
1.2.9 Limitation of MBRL (5 pts)
MBRL is limited by our dynamics model’s capacity to model the true dynamics. If we fail to model the true dynamics well, then derived policies will never be good. In this case, we also need to have a good cost function which might take a lot of effort to design.
However, we would want to use MBRL in an environment where we want our models to be able to generalize and achieve different goals which are target states in the environment. This would be hard to achieve with policy-gradient methods.
Problem 2: Theoretical Questions (20 pts)
2.1 Deterministic vs Stochastic Model (10 pts)
2.2 Aleatoric vs Epistemic Uncertainty (5 pts)
2.3 Failure Modes without Considering Uncertainty (5 pts)
Feedback (1pts): You can help the course staff improve the course for future semesters by providing feedback. You will receive a point if you provide actionable feedback for each of the following categories.
What advice would you give to future students working on this homework?
Make sure to observe standard deviations across iterations in things like CEM, and to make sure they do not collapse to 0 as it would kill exploration.
Also, check on how much data the networks actually train on each iteration to make sure it is a sufficient amount that would not cause high variance gradient updates. Finally, tensorflow probability was a great way to sanity check my loss function.
Time Spent (1pt): How many hours did you spend working on this assignment? Your answer will not affect your grade.
Alone 20
With teammates 0
With other classmates 0
At office hours 2