$30
1 Linear Classifier with a Margin [10pts]
Show that, regardless of the dimensionality of the feature vectors, a data set that has just two data points, one from each class, is sufficient to determine the location of the maximum-margin hyperplane. Hint #1: Consider a data set of two data points, x1 ∈ C1 (y1 = +1) and x2 ∈ C2
(y2 = −1) and set up the minimization problem (for computing the hyperplane) with appropriate constraints on wTx1 + b and wTx2 + b and solve it. Hint #2: This can be formed as a constrained optimization problem.
arg min w∈Rp
Subject to: (some constraint)
What is w? b? Hint: What are the constraints? How did we solve the constrained optimization problem in Fisher’s linear discriminate (see Linear Models Lecture Notes or constrained optimization from Calculus)?
2 Linear Regression with Regularization [10pts]
In class we derived and discussed linear regression in detail. Find the result of minimize the loss of sum of the squared errors; however, add in a penalty for an L2 penalty on the weights. More
formally,
argmin w
How does this change the solution to the original linear regression solution? What is the impact of adding in this penalty?
Write your own implementation of logistic regression and implement your model on either realworld (see Github data sets: https://github.com/gditzler/UA-ECE-523-Sp2018/tree/master/ data), or synthetic data. If you simply use Scikit-learn’s implementation of the logistic regression classifier, then you’ll receive zero points. A full 10/10 will be awarded to those that implement logistic regression using the optimization of cross-entropy using stochastic gradient descent.
3 Density Estimation [20pts]
The ECE523 Lecture notes has a function for generating a checkerboard data set. Generate checkerboard data from two classes and use any density estimate technique we discussed to classify new data using
where is your estimate of the posterior given you estimates of using a density estimator and pbY (y) using a maximum likelihood estimator. You should plot pbX|Y (x|y) using a pseudo color plot (see https://goo.gl/2SDJPL). Note that you must model pbX(x), pbY (y), and pbX|Y (x|y). Note that pbX(x) can be calculated using the Law of Total Probability.
4 Conceptual [5pts]
The Bayes decision rule describes the approach we take to choosing a class ω for a data point x. This can be achieved modeling P(ω|x) or P(x|ω)P(ω)/P(x). Compare and contrast these two approaches to modeling and discuss the advantages and disadvantages. For the latter model, why might knowing P(x) be useful?