$25
Instructions: There are four problems. Partial credit is given for answers that are partially correct. No credit is given for answers that are wrong or illegible. Write neatly.
You must submit two PDFs on D2L. The first PDF has the results to the analytical questions as well as figures that are generated
1 Linear Classifier with a Margin
Show that, regardless of the dimensionality of the feature vectors, a data set that has just two data points, one from each class, is sufficient to determine the location of the maximum-margin hyperplane. Hint #1: Consider a data set of two data points, x1 ∈ C1 (y1 = +1) and x2 ∈ C2
(y2 = −1) and set up the minimization problem (for computing the hyperplane) with appropriate constraints on wTx1 + b and wTx2 + b and solve it. Hint #2: This can be formed as a constrained optimization problem.
arg min w∈Rp
Subject to: (some constraint)
What is w? b? Hint: What are the constraints? How did we solve the constrained optimization problem in Fisher’s linear discriminate (see Linear Models Lecture Notes or constrained optimization from Calculus)?
2 Linear Regression with Regularization
In class we derived and discussed linear regression in detail. Find the result of minimize the loss of sum of the squared errors; however, add in a penalty for an L2 penalty on the weights. More
formally,
argmin w
How does this change the solution to the original linear regression solution? What is the impact of adding in this penalty?
Write your own implementation of logistic regression and implement your model on either realworld (see Github data sets: https://github.com/gditzler/UA-ECE-523-Sp2018/tree/master/ data), or synthetic data. If you simply use Scikit-learn’s implementation of the logistic regression classifier, then you’ll receive zero points. A full 10/10 will be awarded to those that implement logistic regression using the optimization of cross-entropy using stochastic gradient descent.
3 Density Estimation
The ECE523 Lecture notes has a function for generating a checkerboard data set. Generate checkerboard data from two classes and use any density estimate technique we discussed to classify new data using
where is your estimate of the posterior given you estimates of using a density estimator and pbY (y) using a maximum likelihood estimator. You should plot pbX|Y (x|y) using a pseudo color plot (see https://goo.gl/2SDJPL). Note that you must model pbX(x), pbY (y), and pbX|Y (x|y). Note that pbX(x) can be calculated using the Law of Total Probability.
4 Conceptual
The Bayes decision rule describes the approach we take to choosing a class ω for a data point x. This can be achieved modeling P(ω|x) or P(x|ω)P(ω)/P(x). Compare and contrast these two approaches to modeling and discuss the advantages and disadvantages. For the latter model, why might knowing P(x) be useful?