Starting from:

$29.99

CS6046 Assignment 3- Multi-Armed Bandits Solution


SCORING: There are 2 question in this assignment which contributes 10 points each towards your final grade.
WHAT SHOULD YOU SUBMIT? You should submit a zip file titled ’Solutions rollnumber.zip’ Your assignment will NOT be graded if it does not contain all of the following: • A PDF file which includes explanations regarding each of the solution as required in the question. Title this file as ‘Report.pdf’
• Source code for all the programs that you write for the assignment clearly named.
CODE LIBRARY: You are expected to code all algorithms from scratch. You cannot use standard inbuilt libraries for the algorithms. You are free to use inbuilt libraries for plots. You can code using either Python or Matlab or C.
GUIDELINES: Keep the below points in mind before submission.
• Don’t be vague in your explanations. The clearer your answer is, the more chance it will be scored higher.
Q1:Gaussian Explore Then Commit : Investigate the empirical behaviour of the Explore then Commit algorithm when the loss of the two arms are standard Gaussian random variables with mean µ1 = 0 and µ2 = ∆.
• Write a piece of code for a function that takes T and δ as input and returns the number of rounds T0 needed for exploration which minimizes the regret.
• For ∆ = 0.1 and T = 1000, plot the regret as a function of T0. Plot error bars for the regret for each value of T0 over 10,000 runs.
• Explain your insights from the plots and the theoretical bound you obtained in the first part of the question.
• Instead of standard Gaussian arms, use Gaussians with same mean as above but variance σ2 for both arms. Try different choices of σ2 and explain your insights for the above problem.
Q2: Thompson vs UCB
Consider the same setting of two armed Standard Gaussian Bandits as in Question 1. • How will you adapt the UCB and Thompson sampling algorithms to work for this setting?
• Fix T = 10000 and ∆ = 0.1 and compare the regret bound of both the algorithms. What are your observations.
• Change ∆ in steps of 0.1 from 0.1 till 1. How do these algorithms perform? What are your observations?

More products