Figure 1: Gridworld As shown in Fig.1, each grid in the Gridworld represents a certain state. Let st denotes the state at grid t. Hence the state space can be denoted as S = {st|t ∈ 0,..,35}. S1 and S35 are terminal states, where the others are nonterminal states and can move one grid to north, east, south and west. Hence the action space is A = {n,e,s,w}. Note that actions leading out of the Gridworld leave state unchanged. Each movement get a reward of -1 until the terminal state is reached. A good policy should be able to find the shortest way to the terminal state randomly given an initial non-terminal state. 3 Experiment Requirments • Programming language: python3 • You should build the Gridworld environment and respectively implement policy iteration and value iteration methods to improve an uniform random policy π(n|·) = π(e|·) = π(s|·) = π(w|·) = 0.25. 4 Report and Submission • Your report and source code should be compressed and named after “studentID+name”.