CMPE58Y-Homework 3 Policy Gradient With Function Appro

Starting from:

$25

In this homework you will implement policy gradient algorithm with a neural network for the cart pole task [1] in OpenAI Gym environment. As in previous homework, do not care about done variable. Terminate the episode after 500 iterations. You can consider the task is solved if you consistently get +450 reward.

2 Policy Gradient
As it is explained in the lecture, your RL agent can be a neural network. Since the environment is not complex, in this homework you will use a single layer with at most 4 neurons. (Our implementation has a single neuron and can solve the task approximately in 50 episodes where there are 50 rollouts in each episode. Considering the state space, there are 4 weights and 1 bias for the neuron. Activation function is a sigmoid. Discount factor is 0.99, learning rate is 0.05. Average reward of the roll-outs is used as baseline. Remember to check the course website for the explanation of causality principle.)

In the lecture, we used a Gaussian distribution for the probability distribution of actions, but there are 2 discrete actions in cart pole task. Therefore, it is not reasonable to use a Gaussian distribution. Instead, a Bernoulli distribution will be used and the output of the network will be the probability(p) of pushing the cart either to the left or to the right. (The other probability is 1-p naturally) Remember that:

T T

5θ J(θ) = Eπθ[(X5θ logπθ(at|st)(Xγk−t ∗ R(sk,ak))]

t=1 k=t

Here:
(1)
πθ(at|st) = pn ∗ (1 − p)1−n

Where:
(2)
p = Pθ(at) = sigmoid(θ ∗ s + b)
(3)
So:

1

))] (4)

(5)

Because of the property of sigmoid:

5θ p = 5θPθ(at) = p ∗ (1 − p) ∗ s

Then;
(6)
5θ logπθ(at|st) = n ∗ (1 − p) ∗ s + (1 − n) ∗ (−p) ∗ s
(7)
After you correctly calculate these gradients, you can update your parameters using stochastic gradient descent as in the second homework.

More products

Algorithmic-Strategies Problem B - ARChitecture Solution

$29.99

Add to cart

Algorithmic-Strategies Problem A - 2048 clean up! Solution

$34.99

Add to cart

AIE314 Lab Exercise 1- Very Simple Search Engine Solution

$24.99

Add to cart

CMPE58Y-Homework 3 Policy Gradient With Function Approximation Solved

More products