where (1) f(x) = 1 + 2sin(5x) − sin(15x) (2) and ǫ is additive noise that is Gaussian distributed with zero mean and unit variance. The added noises for different x values are independent.
Using this model, we would like to test polynomial fitting. Specifically,
(a) Generate 51 equally spaced x values between 0 and 1:
. (3)
(b) Generate yi values for these xi values:
yi = f(xi) + ǫi, i = 0,1,...,50. (4) where ǫi are independently and identically distributed standard Gaussian random variables. The set of generated values (xi,yi),i = 0,...,50, will serve as the training data.
(c) Fit a polynomial of order k = 1 to the dataset, minimizing the residual sum of squares. Plot the fitted polynomial using black color.
(d) Repeat the steps (b) to (c) 30 times. Keep all the plotted polynomials.
(e) Repeat the experiment for k = 3,5,7,9,11. For each k, use a new figure. Also, on each figure, show the function f(x) using red color.
You need to write the code using Python. You should implement the polynomial fitting function (see the “Linear Regression” lecture notes). You will be graded based on two artifacts: 1) the figures generated (need to be included in the submitted homework report), and 2) the source code submitted.
Problem 2. Use Python to implement the perceptron algorithm and test it on the following data:
Page 1 of 2
where there are 6 points, (x1,x2) is the input, and y is the binary output label. Initialize your algorithm with the vector
θ = [b,w1,w2]T = [0,0,0]T. (5)
Plot the points and the final hyperplane (a line) on the same graph.
Problem 3. The Spambase Data Set contains email spam data for 4601 email messages. Download the data from https://archive.ics.uci.edu/ml/datasets/spambase and divide the data into training set and test set. The training set should contain the first 2/3 of spam messages and first 2/3 of ham (i.e., non-spam) messages. The test set should contain the last 1/3 spam messages and last 1/3 ham messages.
(a) Write a logistic regression program (function) using gradient descent algorithm. Andtrain the weights using training set and then test the result on the test set. Experiment with the step size (learning rate).
(b) Next, normalize the features, so that each feature in the training data has mean 0 andvariance 1. Then run logistic regression on the normalized data.