CS7642 Homework #4Q-Learning Solved

Your shopping cart is empty.

Problem
Description
In this   homework       you      will      have    the      complete          RL experience.      You     will       work   towards            implementing and evaluating the Q-learning algorithm on a simple domain. Q-learning       is a fundamental RL algorithm and has been successfully       used to solve a variety of decision-making problems. For this homework, you    will      have to         think   carefully           about algorithm         implementation, specially           exploration parameters.

The domain you will be tackling is called Taxi (Taxi-v2). It is a discrete MDP which has been used for RL research in the past. This      will also be your first opportunity       to         become            familiar             with    the      OpenAI Gym     environment   ( https://gym.openai.com/       ). This   is a cool       and     unique              platform           where users can            test their    RL        algorithms       over    a selection       of         domains.

The Taxi problem was introduced in Dietterich(2000). It is a grid-based   domain where the      goal of the agent          is to    pick     up        a passenger          at         one     location            and                 drop     them off in another. There are      4 fixed locations,           each assigned a different                         letter. The agent                       has 6 actions; 4 for         movement,      1 for   pickup, and      1 for   dropoff. The domain has a discrete state space and deterministic transitions.

Procedure

Implement a basic version of the Q-learning algorithm and          use      it to             solve the            taxi      domain.            The      agent should explore the MDP, collect data to the learn the    optimal policy and the optimal Q-value function. ( Be mindful    of         how                you handle        terminal states,              typically            if St     is a terminal state, V(St+1) = 0).      Use      𝛾 = 0.90.         Also,   you      will      see                  how Epsilon-Greedy strategy            can      find     the      optimal             policy despite   of         finding sub-optimal q-values. Because we are   looking for       optimal             q-values, you will have to         try       different             exploration      strategies.

You can evaluate your agent      offline or        by        uploading         your experiment file             to the OpenAI server using a GitHub     account. The latter will generate a learning curve            ( reward/steps vs         episodes)           which is indicative       of                     performance. The OpenAI server will indicate if your implementation has solved the            domain,              that     is,        found an        optimal policy. Note that all evaluations uploaded to the OpenAI server         are      publicly             accessible.          However,                     please do not attach your code as a gist write-up to              these evaluation       .

1

Examples
Below are the optimal Q values for 5 (state, action) pairs of the Taxi domain.

Q(462, 4) = -11.374402515 Q(398, 3) = 4.348907
Q(253, 0) = -0.5856821173
Q(377, 1) = 683
Q(83, 5) = -12.8232660372
Resources
The concepts explored in this homework are covered by:

LecturesConvergence
○ Exploring Exploration

Readings

Asmuth-Littman-Zinkov-2008.pdf
littman-1996.pdf
○                                   ( chapters 1-2)

2

Shopping cart

US$0

CS7642 Homework​ ​#4Q-Learning Solved

More products

CS7642 Homework #4Q-Learning Solved