$34.99
Assignment 2
SCORING: There are 4 questions in this assignment. The first three questions are mandatory and carry 5 points each. The last question is a play question and is not mandatory. It carries 5 points. The contribution of points scored in this assignment towards your final grades will be calculated as min{your score, 15}
The points will be decided based on the clarity and rigour of the report provided and the correctness of the code submitted.
DATASETS The data-sets are in the corresponding google drive folder shared in Moodle.
WHAT SHOULD YOU SUBMIT? You should submit a zip file titled ’Solutions
rollnumber.zip’ where rollnumber is your institute roll number. Your assignment will NOT be graded if it does not contain all of the following:
• A text file titled ‘Details.txt’ with your name and roll number.
• A PDF file which includes explanations regarding each of the solution as required in the question. Title this file as ‘Report.pdf’
• Clearly named source code for all the programs that you write for the assignment
.
CODE LIBRARY: You are expected to code all algorithms from scratch. You cannot use standard inbuilt libraries for computations. The only allowed library are those that compute the Eigenvectors and Eigenvalues of matrices. If your code calls any other library function for computation, it will fetch 0 points. You are free to use inbuilt libraries for plots. You can code using either Python or Matlab or C.
GUIDELINES: Keep the below points in mind before submission.
• Any graph that you plot is unacceptable for grading unless it labels the x-axis and y-axis clearly.
• Don’t be vague in your explanations. The clearer your answer is, the more chance it will be scored higher.
QUESTIONS
(1) You are given a data-set with 1000 data points generated from a mixture of some distribution in the file A2Q1Data.csv.
i. (i) Determine which probabilisitic mixture could have generated this data (It is not a Gaussian mixture). Derive the EM algorithm for your choice of mixture and show your calculations. Write a piece of code to implement the algorithm you derived by setting the number of mixtures K = 4. Plot the log-likelihood (averaged over 100 random initializations) as a function of iterations.
ii. (ii) Assume that the same data was infact generated from a mixture of Gaussians with 4 mixtures. Implement the EM algorithm and plot the log-likelihood (averaged over 100 random initializations of the parameters) as a function of iterations. How does the plot compare with the plot from part (i)? Provide insights that you draw from this experiment.
iii. Run the K-means algorithm with K = 4 on the same data. Plot the objective of K − means as a function of iterations. iv. Among the three different algorithms implemented above, which do you think you would choose to for this dataset and why?
(2) You are given a data-set in the file A2Q2Data train.csv with 10000 points in (R100,R) (Each row corresponds to a datapoint where the first 100 components are features and the last component is the associated y value).
i. Obtain the least squares solution wML to the regression problem using the analytical solution.
ii. Code the gradient descent algorithm with suitable step size to solve the least squares algorithms and plot kwt − wMLk2 as a function of t. What do you observe?
iii. Code the stochastic gradient descent algorithm using batch size of 100 and plot kwt −wMLk2 as a function of t. What are your observations?
iv. Code the gradient descent algorithm for ridge regression. Cross-validate for various choices of λ and plot the error in the validation set as a function of λ. For the best λ chosen, obtain wR. Compare the test error (for the test data in the file A2Q2Data test.csv) of wR with wML. Which is better and why?
(3) Consider estimating the mean of a uni-variate Gaussian random variable with mean µ and variance 1. Assume you have access to a single sample y from this distribution. What is the maximum likelihood estimator µML for µ? Now consider shrinking your estimator µML to a new estimator µnew = µML/2. Can this shrunk estimator have lesser MSE than the ML estimator? If yes, under what condition on µ can this happen? If no, prove the same.
(4) Play Question: In the above question, given a interval [a,b] where a < b, can you construct an estimator whose MSE is lower in the given interval than the MSE of the maximum likelihood estimator in the same interval? Can you prove why your estimator achieves this property?