$24.99
(a). Consider using linear regression for binary classification on the label {0, 1}. Here, we use a linear model
βπ(π₯) = π1π₯ + π0
and squared error loss . The threshold of the prediction is set as
0.5, which means the prediction result is 1 if βπ(π₯) ≥ 0.5 and 0 if βπ(π₯) < 0.5. However, this loss has the problem that it penalizes confident correct predictions, i.e., βπ(π₯) is larger than 1 or less than 0. Some students try to fix this problem by using an absolute error loss πΏ = |βπ(π₯) − π¦|. The question is: Will it fix the problem? Please answer the question and explain it. Furthermore, some other students try designing another loss function as follows
.
Although it is not complete yet, if it is correct in principle, please complete it and explain how it can fix the problem. Otherwise, please explain the reason.
(b). [5 point] Consider the logistic regression model βπ(π₯) = π(πππ₯), trained using the binary cross entropy loss function, where is the sigmoid function. Some students try modifying the original sigmoid function into the following one
The model would still be trained using the binary cross entropy loss. How would the model prediction rule, as well as the learnt model parameters π , differ from conventional logistic regression? Please show your answer and explanation.
1
Consider using logistic regression for classification problems. Four 3-dimensional data pointsβ‘(π₯1, π₯2, π₯3)πβ‘and the corresponding labels π¦i are given as follows.
Data point π₯1 π₯2 π₯3 y
D1 -0.120 0.300 -0.010 1
D2 0.200 -0.030 -0.350 -1
D3 -0.370 0.250 0.070 -1
D4 -0.100 0.140 -0.520 1
The learning rate π is set as 0.2 and the initial parameter π[0] is set as [-0.09, 0, -0.19, -
0.21]. Please answer the following questions.
a) [5 point] Calculate the initial predicted label for each data point.
b) [10 point] Calculate the parameter in the first and second iterations, i.e., π[1], π[2], by using gradient descent algorithm.
c) [5 point] Implement the gradient descent algorithm to update the parameters π using python language. Please show the change trend diagram of loss function π½(π) in 50000 rounds and upload the source code file. ps. For a) and b), the detailed calculation process is required and the intermediate and final results should be rounded to 3 decimal places.
2