Contact Us
Projects | Assignments
sellfycode@gmail.com
More
Get Homework Help
English
Español
Português
Français
Deutsch
Italiano
Русский
close
Contact Us
Projects | Assignments
sellfycode@gmail.com
Starting from:
$25
Add to cart
CS7642 Homework #4Q-Learning Solved
Problem
Description
In this homework you will have the complete RL experience. You will work towards implementing and evaluating the Q-learning algorithm on a simple domain. Q-learning is a fundamental RL algorithm and has been successfully used to solve a variety of decision-making problems. For this homework, you will have to think carefully about algorithm implementation, specially exploration parameters.
The domain you will be tackling is called Taxi (Taxi-v2). It is a discrete MDP which has been used for RL research in the past. This will also be your first opportunity to become familiar with the OpenAI Gym environment ( https://gym.openai.com/ ). This is a cool and unique platform where users can test their RL algorithms over a selection of domains.
The Taxi problem was introduced in Dietterich(2000). It is a grid-based domain where the goal of the agent is to pick up a passenger at one location and drop them off in another. There are 4 fixed locations, each assigned a different letter. The agent has 6 actions; 4 for movement, 1 for pickup, and 1 for dropoff. The domain has a discrete state space and deterministic transitions.
Procedure
Implement a basic version of the Q-learning algorithm and use it to solve the taxi domain. The agent should explore the MDP, collect data to the learn the optimal policy and the optimal Q-value function. ( Be mindful of how you handle terminal states, typically if St is a terminal state, V(St+1) = 0). Use 𝛾 = 0.90. Also, you will see how Epsilon-Greedy strategy can find the optimal policy despite of finding sub-optimal q-values. Because we are looking for optimal q-values, you will have to try different exploration strategies.
You can evaluate your agent offline or by uploading your experiment file to the OpenAI server using a GitHub account. The latter will generate a learning curve ( reward/steps vs episodes) which is indicative of performance. The OpenAI server will indicate if your implementation has solved the domain, that is, found an optimal policy. Note that all evaluations uploaded to the OpenAI server are publicly accessible. However, please do not attach your code as a gist write-up to these evaluation .
1
Examples
Below are the optimal Q values for 5 (state, action) pairs of the Taxi domain.
Q(462, 4) = -11.374402515 Q(398, 3) = 4.348907
Q(253, 0) = -0.5856821173
Q(377, 1) = 683
Q(83, 5) = -12.8232660372
Resources
The concepts explored in this homework are covered by:
LecturesConvergence
○ Exploring Exploration
Readings
Asmuth-Littman-Zinkov-2008.pdf
littman-1996.pdf
○ ( chapters 1-2)
2
Starting from:
$25
Add to cart
1 file (1.8MB)
More products
CECS328 Extra Credit 5 Solved
$20
Add to cart
CECS328 Extra Credit 4 Solved
$20
Add to cart
CECS328 Extra Credit 3 Solved
$20
Add to cart