$30
Task 7: Reinforcement Learning
Goal:Solve an MDP problem using policy and value iteration.
Note: This assignment is to be done individually.
MDP(Markov Decision Process): Grid World Problem:
There will be a grid, location of the player in the grid will represent a state, there will be a starting state, there will be two absorbing states having very different rewards like +1 and -20 while other states will have negative reward -1 associated with them, movement to that state will incur this negative reward. The black block is a wall where your agent won’t be able to penetrate through. The transition probabilities for moving from one state to another are also given below.
We need to find optimal movement direction for each state.
End +10
End -200
ssssssss
Start
Below are the transition probabilities
Develop code for solving the MDP problem using policy and value iteration.
Write a report clearly describing the above MDP considered and your observations on running the policy and value iteration algorithms on the formulated MDP.
Further, one should also suggest ways to check whether the algorithm yields optimal policy for the setting considered.
Please submit a zip file <Roll_number>.zip with the following contents
Program: <Roll_number>.<extension> (e.g., 1800100xx.c/cpp)
Report: <Roll_number>.<extension> (e.g., 1800100xx.pdf). Report should be in pdf format.
Readme file: readme.txt (Execution details)
Report Format :
MDP Description: Clearly describe (S, A, P, R, N)
State-transition Graph for the MDP
Optimal Policy: Suggest ways to check whether the algorithm yields optimal policy for the setting considered.
Experimental Results: Vary the gamma parameter, show the policy found in each case by both algorithms
Comparison of Policy Iteration and Value Iteration
Conclusions
For Reference : Reinforcement-Learning: http://www.cse.iitm.ac.in/~ravi/courses/Reinforcement%20Learning.html See lectures 15 -25.