Starting from:

$24.99

COMP4434 Assignment #2 Solution


(a). Consider using linear regression for binary classification on the label {0, 1}. Here, we use a linear model
β„Žπœƒ(π‘₯) = πœƒ1π‘₯ + πœƒ0
and squared error loss . The threshold of the prediction is set as
0.5, which means the prediction result is 1 if β„Žπœƒ(π‘₯) ≥ 0.5 and 0 if β„Žπœƒ(π‘₯) < 0.5. However, this loss has the problem that it penalizes confident correct predictions, i.e., β„Žπœƒ(π‘₯) is larger than 1 or less than 0. Some students try to fix this problem by using an absolute error loss 𝐿 = |β„Žπœƒ(π‘₯) − 𝑦|. The question is: Will it fix the problem? Please answer the question and explain it. Furthermore, some other students try designing another loss function as follows
.
Although it is not complete yet, if it is correct in principle, please complete it and explain how it can fix the problem. Otherwise, please explain the reason.

(b). [5 point] Consider the logistic regression model β„Žπœƒ(π‘₯) = 𝑔(πœƒπ‘‡π‘₯), trained using the binary cross entropy loss function, where is the sigmoid function. Some students try modifying the original sigmoid function into the following one

The model would still be trained using the binary cross entropy loss. How would the model prediction rule, as well as the learnt model parameters πœƒ , differ from conventional logistic regression? Please show your answer and explanation.




1

Consider using logistic regression for classification problems. Four 3-dimensional data points⁑(π‘₯1, π‘₯2, π‘₯3)𝑖⁑and the corresponding labels 𝑦i are given as follows.
Data point π‘₯1 π‘₯2 π‘₯3 y
D1 -0.120 0.300 -0.010 1
D2 0.200 -0.030 -0.350 -1
D3 -0.370 0.250 0.070 -1
D4 -0.100 0.140 -0.520 1

The learning rate πœ‚ is set as 0.2 and the initial parameter πœƒ[0] is set as [-0.09, 0, -0.19, -
0.21]. Please answer the following questions.
a) [5 point] Calculate the initial predicted label for each data point.
b) [10 point] Calculate the parameter in the first and second iterations, i.e., πœƒ[1], πœƒ[2], by using gradient descent algorithm.
c) [5 point] Implement the gradient descent algorithm to update the parameters πœƒ using python language. Please show the change trend diagram of loss function 𝐽(πœƒ) in 50000 rounds and upload the source code file. ps. For a) and b), the detailed calculation process is required and the intermediate and final results should be rounded to 3 decimal places.
2

More products