Starting from:

$30.99

CSCE633-Homework 1 Solved

Question 1
1-dimensional linear regression: Assume a 1-dimensional linear regression model y = w0 + w1x. The residual sum of squares (RSS) of the training data Dtrain = {(x1,y1),...,(xN,yN)} can be written as:

N

RSS(w0,w1) = X(yn − w0 − w1xn)2

n=1

We estimate the weights w0, w1 by minimizing the above error.

Show that minimizing RSS results in the following closed-form expression:
Tip: Set the partial derivatives  and  equal to 0. Then solve a 2 × 2 system of linear equations with respect to w0 and w1.

Show that the above expressions for and are equivalent to the following:
where ¯ and ¯ are the sample means of input features and outcome values, respectively.

How would you interpret the above expression in terms of the descriptive statistics (e.g.sample mean, variance, co-variance) of populations and?
Question 2
Principled method for learning the step size in gradient descent: In class we discussed that when we use gradient descent to minimize target function J(w) with respect to w, the step size α(k) at iteration k is a crucial hyperparameter. We further said that we can experimentally determine α(k) through cross-validation. There is actually a principled way for computing the optimal α(k) in each iteration and we are going to derive the expression for that.

According to Taylor series expansion, a differentiable function f(x) can be written around x0 as follows:
where ∇f are the gradient vector and Hf Hessian matrix of f evaluated at x0.

Let w(k) be the value of w at the kth iteration of gradient descent. Show that the second order Taylor expansion of the target function J(w) around w(k) is the following:

where ∇J are the gradient vector and HJ Hessian matrix of J evaluated at w(k).

Show that the above expression of J(w) evaluated at w(k+1) (i.e. at the (k+1)th gradient descent iteration) can be written as:
Tip: Take into account the gradient descent update rule w(k + 1) = w(k) − α(k) · ∇J|w=w(k) (c) Show that minimizing the above expression with respect to the step size α(k) results in:

The above expression gives a closed-form solution of the step size at iteration k (i.e. a(k)) that minimizes the target function at the next iteration.

(d) What is the cost of computing a(k) at each iteration k using the above expression?

Question 3
Predicting forest fires: Forest fires are a major environmental issue endangering human lives. This renders their fast detection a key element for controlling them and potentially preventing them. Since it is hard for humans to monitor all forests, we can use automatic tools based on local sensors to do that. Through these sensors we can get information regarding the meteorological conditions, such as temperature, wind, relative humidity (RH), and amount of rain. We can also compute several fire hazard indexes, such as the forest fire weather index (FWI), fine fuel moisture code (FFMC), duff moisture code (DMC), drought code (DC), and initial spread index (ISI). Using these measures, we can predict whether fire is going to occur in the forest, as well as to estimate the amount of burned area. Such data are part of the “Forest Fires Data Set” of the UCI Machine Learning Repository and their description can be found here: http://archive.ics.uci.edu/ml/datasets/Forest+Fires.

Inside “Homework 1” folder on Piazza you can find two files including the train and test data (named “train.csv” and “test.csv”) for our experiments. The rows of these files refer to the data samples, while the columns denote the features (columns 1-12) and the outcome variable (column 13), as describe bellow:

X: x-axis spatial coordinate of the forest: 1 to 9
Y: y-axis spatial coordinate of the forest: 2 to 9
month: month of the year: 1 to 12 to denote ”jan” to ”dec”
day: day of the week: 1 to 7 to denote ”mon” to ”sun”
FFMC: FFMC index from the FWI system
DMC: DMC index from the FWI system
DC: DC index from the FWI system
ISI: ISI index from the FWI system
temp: temperature in Celsius degrees
RH: relative humidity
wind: wind speed in km/h
rain: outside rain in mm/m2
area: the burned area of the forest (this is the outcome variable)
Data exploration: Inspect the input features (e.g. you can plot histograms, scatter plots, etc.). Which of the features are continuous and which categorical?
Classification: From data exploration, we can notice that the the outcome value (i.e. the burned area) is zero for many samples, meaning that the corresponding forests are not affected by fire. Therefore we can dichotomize the outcome variable, based on whether its corresponding value is zero or greater than zero. This creates the following two classes:
Class 0: Forests not affected by the fire, i.e. area = 0

Class 1: Forests affected by the fire, i.e. area > 0

After dichotomizing the outcome variable, we can run a classification task to predict whether or not fire will occur in a certain forest based on the input features.

Implement a K-Nearest Neighbor classifier (K-NN) using the euclidean distance as a distance measure to perform the above binary classification task. Reminder: Don’t forget to normalize the features.
Explore different values of K through cross-validation on the training set. Plot the classification accuracy, i.e. (#samples correctly classified) / (total #samples), against the different values of K.
Report the classification accuracy on the test set using the best K from cross-validation. (b.iv) Bonus: Instead of using the euclidean distance for all features, experiment with different types of distances or distance combinations, i.e. Hamming distance for categorical features. Report your findings.
Linear Regression: Among the forests that were affected by the fire, we can use linear regression to predict the actual amount of area that was burned. For this task, we will only use the samples of the train and test set with burned area (column 13) greater than zero, i.e. area > 0.Plot the histogram of the outcome variable. What do you observe? Plot the histogram ofthe logarithm of the outcome value, i.e. log(area). What do you observe now?
Implement a linear regression model to fit the outcome data using the ordinary leastsquares (OLS) solution.
Test your model on the test data and compute the residual sum of squares error (RSS) and the correlation between the actual and predicted outcome variable.
(c.iii) Bonus: Experiment with different non-linear functions of the input features. Report your findings on the train and test sets.

More products