CS 446/546 Advanced Topics in Machine Learning Programming Assignment #2 Solved

Your shopping cart is empty.

CS 446/546
Advanced Topics in Machine Learning
Programming Assignment #2

In this homework you will write code to implement tic-tac-toe using techniques from Reinforcement Learning.

Your Assignment:

Part 1: Write a simple simulator for tic-tac-toe that encodes (and displays) a standard 3x3 grid, with a check for licit moves and “goal state” (i.e. did someone win/draw).

Implement a form of Q-learning to train a tic-tac-toe playing “agent”. One straightforward option for Q-learning is to use a Q-matrix, in which the rows correspond to states and the columns correspond to actions (you have the option of using a more sophisticated method than a Q-matrix if you wish, such as a Q-network). The Q-matrix is commonly initialized to all zeros at the beginning of a run (although you may use a different initialization strategy if you prefer – please note this in your write-up).

At each time step t during an episode, your code should do the following:

• Observe the current state st

• Choose an action at, using -greedy action selection (I recommend training on-policy)

• Perform the action

• Observe the new state st+1

• Update 𝑄(𝑠𝑡, 𝑎𝑡) = 𝑄(𝑠𝑡, 𝑎𝑡) + 𝜂(𝑟𝑡 + 𝛾𝑚𝑎𝑥𝑎′𝑄(𝑠𝑡+1, 𝑎′) − 𝑄(𝑠𝑡, 𝑎𝑡))

• Receive reward rt (at the conclusion of a game, include “reward” for outcomes, including: win, lose or draw)

For choosing actions with -greedy action selection, initialize  (for example:  = 0.1), and decrease it by a constant value Δ every m epochs until it reaches 1 (I encourage you to experiment with different choices for Δ and m, as well as the initial value for ).

After each training epoch, test your agent on 10 games against a baseline random opponent. Record the total score of your agent against the baseline out of these 10 games (+1 for a win, +.5 for draw, 0 for loss). After training is completed, print a plot of the training progress for your agent with the epoch number on the horizontal axis and the (total score) / 10 against the baseline opponent on the vertical axis.

When you have completed training, play 10 games against your agent and report these results.

(Optional) Parts 2-3 are optional

Part 2: Experiment with Learning Rate. Choose 4 different values for the learning rate, 𝜂, approximately evenly spaced in the range [0,1], keeping the other parameters set as in Part 1. For each value, give a performance plot as described above. Discuss how changing the learning rate changes these results.

Part 3: Q-network. Use a Q-network in place of a Q-matrix; describe how you train and architected your network. Provide a performance plot.

Here is what you need to turn in:

Your spell-checked, double-spaced report with the information requested above. Also, your (briefly) commented code with instructions how to run it. Your report should include a detailed summary of your approach and results.

How to turn it in:

• Send these items in electronic format to me (arhodes@pdx.edu) on the due date. No hard copy please!

• The report should be in pdf format and the code should be in its original format (e.g., .py, .m, etc.)

Shopping cart

US$0

CS 446/546 Advanced Topics in Machine Learning Programming Assignment #2 Solved

More products