$24.99
Homework #2 TD(λ)
Problem
Description
Recall that the TD(λ ) estimator for an MDP can be thought of as a weighted combination of the k-step estimators Ek for k ≥ 1.
Consider the MDP described by the following state diagram. (Assume the discount factor is γ = 1.)
Procedure
● Find a value of λ , strictly less than 1, such that the TD estimate for λ equals that of the
TD(1) estimate. Round your answer for λ to three decimal places.
● This HW is designed to help solidify your understanding of the Temporal Difference algorithms and k-step estimators. You will be given the probability to State 1 and a vector of rewards {r0, r1, r2, r3, r4, r5, r6}
1
Examples
The following examples can be used to verify your calculation is correct.
● Input: probToState=0.81, valueEstimates={0.0,4.0,25.7,0.0,20.1,12.2,0.0}, rewards={7.9,-5.1,2.5,-7.2,9.0,0.0,1.6}, Output: 0.6226326309908364
● Input: probToState=0.22, valueEstimates={0.0,-5.2,0.0,25.4,10.6,9.2,12.3}, rewards={-2.4,0.8,4.0,2.5,8.6,-6.4,6.1}, Output: 0.49567093118984556
● Input: probToState=0.64, valueEstimates={0.0,4.9,7.8,-2.3,25.5,-10.2,-6.5}, rewards={-2.4,9.6,-7.8,0.1,3.4,-2.1,7.9}, Output: 0.20550275877409016
Resources
The concepts explored in this homework are covered by:
● Lectures
○ Lesson 3: TD and Friends
● Readings
○ Sutton (1988)
Submission Details
To complete the assignment calculate answers to the specific problems given and submit results at https://r ldm.herokuapp.com
2