ME5404-Project 2 Q-Learning for World Grid Navigation

Starting from:

$30

1. Competence in implementing the Q-learning algorithm, and

2. Understanding of the principles of, and implementation issues related to, the Q-learning algorithm.

II. PROBLEM STATEMENT
Suppose that a robot is to traverse on a 10×10 grid, with the top-left and bottom-right cells being the start state and the goal state respectively, as illustrated in Figure 1.

Fig. 1: Illustration of a 10×10 world grid with start state and goal state. The index of each cell follows the MATLAB column-wise convention.

The robot is to reach the goal state by maximizing the total reward of the trip. Note that the numbers (from 1 to 100) assigned to the individual cells represent the states; they do not represent the reward for the cells. At a state, the robot can take one of four actions (as shown in Figure 2) to move up (a=1), right (a=2), down (a= 3), or left (a=4), into the corresponding adjacent state deterministically.

Fig. 2: Possible actions of the robot at a given state.

The learning process will consist of a series of trials. In a trial the robot starts at the initial state (s=1) and makes transitions, according to the algorithm for Q-learning with ε-greedy exploration, until it reaches the goal state (s=100), upon which the trial ends. The above process repeats until the values of the Q-function converge to the optimal values. An optimal policy can be then obtained.

III. REQUIREMENT
A. What to be done

There are two main tasks to be completed for this project.

Task1: Write a MATLAB (M-file) program to implement the Q-learning algorithm, using the reward function as given in task1.mat and with the ε-greedy exploration algorithm by setting εk, αk and γ as specified in Table I.

PETER C. Y. CHEN, 2021 2
The file task1.mat is included in the zipfile that also contains this document. It can be directly loaded into MATLAB and contains the matrix variable reward (dimension: 100×4), in which each column corresponds to an action and each row to a state. For example, the reward for taking action a=3 at state s=1 to enter state s=2 is given by the (1,3) entry of reward, i.e., ρ(1,3,2)= reward(1,3). Note that rewards can be negative.

TABLE I: Parameter values and performance of Q-Learning

εk,αk
No. of goal-reached runs
Execution time (sec.)
γ=0.5
γ=0.9
γ=0.5
γ=0.9
1 k
?
?
?
?
100

100+k
?
?
?
?
1+log(k)

k
?
?
?
?
1+5log(k)

k
?
?
?
?
In this task, εk and αk are set to the same value. You are required to run your program 10 times (i.e., 10 runs) for each set of parameter values and record the number of times the goal state is reached. (Note: It is possible that some of the runs may not yield an optimal policy that results in the robot reaching the goal state; these runs are not to be counted.) The maximum number of trials in each run is set to 3000. The average program execution time of the “goal-reaching” runs is to be calculated and entered into the table (as indicated by “?”). The final output of your program should be an optimal policy (if the goal state is reached in any of the 10 runs), and the reward associated with this optimal policy. The optimal policy is to be expressed as a column vector containing the state transitions, and also illustrated (in your report) on a 10×10 grid with arrows marking the trajectory of the robot moving from state s=1 to state s=100.

Task 2: Write a MATLAB (M-file) program to implement Q-learning using your own values of the relevant parameters. Assume that the grid size is 10×10 and implement your program in a MATLAB M-file. This M-file will be used to find the optimal policy using a reward function not provided to the students, as part of the assessment

More products

CSC342_343 - Computer Science - Quiz4 - Solved

$25

Add to cart

CSC342_343 - Computer Science - Quiz No 1 - Solved

$25

Add to cart

CSC342_343 - Single-Cycle-CPU - Solved

$25

Add to cart

ME5404-Project 2 Q-Learning for World Grid Navigation Solved

More products