Starting from:

$35

CS489 - Assignment 4 Solved

Reinforcement Learning

 
1 Introduction
r    hi 2    The second modific
gets,    rag    g    j    p    v    up
The goal of this assignment is to do experiment with deep Q network (DQN), 
clone th network Q to obtan a target network Q^ and use Q for geerting the
d m
he weights
which combines the advantage of Q-learning and te neural network. In classical
Q-learning targets yj for the following C updates to Q. This modificaton makes th Q-learninglmethods, e the actionrvaluetfunction nQQis eintractable with the increase of state space and action space. DQN introduces the success of deep learning and has achieved a super-human level of play in atari games. Your goal is to implement the DQN algorithm and its improved algorithm and play with them
to Q i made and the time the update afects the targets yj, making divergence or in some classical RL control scenarios.
it s often oscillatios much more unlikely.
o    ng
2 Deep Q-learning
    n jx ha    ti
for al ositive values of x clipp
Algorithm 1: deep Q-learning with experience replay.
Initialize replay memory D to capacity N
Initialize action-value function Q with random weights h
Initialize target action-value function Q^ with weights h2 5h
For episode5 1, M do
Initialize sequence s1~ fx1 g and preprocessed sequence w1~w ðs1 Þ For t5 1,T do
With probability e select a random action at
otherwise select at~argmaxaQ ð w ðst Þ,a; hÞ
Execute action at in emulator and observe reward rt and image xt 1 1 Set stz1~st,at,xtz1 and preprocess wtz1~w ðst z 1 Þ
Store transition (wt,at,rt,wtz1) in D
Sample random minibatch of transitions ( wjajrjwjz1) from D
I
rjzc maxa0Q( wjz1,a0;h)    otherwise
Perform a gradient descent step on ( yj { Q ( j,aj; h ) )2 with respect to the
network parameters h
Every C steps reset ^Q~Q
End For
End For
 rj    if episode terminates at step j z 1
architectue fo object ecognition? Proc IEEE. Int. Conf. Comput. Vis. 2146–
Figure 1: Deep Q-learning with experience replay
p
witch.
You can refer to the original paper for the details of the DQN. “Human-level
mahines. Proc Int. Conf. Mach. Lern. 807–814 (2010)
control through deep reinforcement learning.” Nature 518.7540 (2015): 529.
s could ge    33. Kelbling L. P., Littman, M. L & Cassandra, A    Planning and ating in artially
i    ab thti d    Atifiil Itel    99134 (1994
3 Experiment Description
•    Programming language: python3
•    You should compare the performance of DQN and one kind of improved DQN and test them in a classical RL control environment–MountainCar. OPENAI gym provides this environment, which is implemented with python (https://gym.openai.com/envs/MountainCar-v0/). What’s more, gym also provides other more complex environment like atari games and mujoco.
 
Since the state is abstracted into car’s position, convolutional layer is not necessary in our experiment. You can get started with OPENAI gym refer to this link (https://gym.openai.com/docs/). Note that it is suggested to implement your neural network on the Tensorflow or Pytorch.
4 Report and Submission
•    Your report and source code should be compressed and named after “stu-dentID+name+assignment4”.
•    The files should be submitted on Canvas before Apr. 30, 2021.

More products