Starting from:

$24.99

Artficial Intelligence Homework 2 Solution

 II ΥΣ19 Artificial Intelligence II (Deep Learning for Natural Language Processing)
Nikolaos Galanis - sdi1700019
1 Decisionboundariesforlinearbinaryclassification
Our goal is to prove the following equations, using basic operations from linear algebra.
The 2 equations are on the subject on classification with 2 variables: x1andx2. The simplest representation of a linear discriminant function is obtained by taking a linear function of the input vector so that :
y(x) = wTx + w0
The first equation, takes the sub-case of x lying on the decision surface, where y(x) = 0. The equation can be easily proven by solving y(x0) = 0 for the x0 that satisfies the previously mentioned hypothesis, and then dividing both members by ||w||, which as a vector length is a positive, non-zero number. Thus, the equation is proven as following:

The second equation, takes an arbitrary point x in the hyperplane, and its orthogonal projection onto the decision surface x. Our goal is to prove that
(1)

It is quite clear from (1) that x − xT = ~a. Because both ~a and w~ are vertical to the hyperplane y = 0, they are parallel, thus we can write ~a as:
~a = λ × w~ (2)
Now our goal is to compute the λ value.
We know that r is the perpendicular distance of x from the hyperplane, thus the norm of the vector ~a. Thus, we can write r as:
r = |~a| = |x − x⊥|
Finally, by applying norms to the previous equation, we get that

By substituting lambda value in (2), we get that

, which is what we wanted to prove.
2 ComputingPartial Derivativesof theoutputlayer
Let x ∈ R1×n be a row vector, W ∈ Rn×m be a matrix and z = xW. Our goal is to compute .
The result matrix after the multiplication of W with x, will be:
  w11 w12 ... w1m
 ...  = z = [x1 x2 ... xn] × 
  wn1 wn2 ... wnm h i
(w11x1 + w21x2 + wn1xn) ... (w1mx1 + w2mx2 + wnmxn)
Thus, if we take the partial derivatives with respect to x, the result as we can easily see, is the initial matrix W. So, ∂x∂z = W
3 ComputingPartialDerivativesofthelogisticfunction
Let yˆ = σ(xTw), where x,w ∈ Rn×1 are column vectors and σ is the logistic function. Our goal is to compute
We can easily compute the partial derivatives by using the chain rule in the given equation.
We know by applying basic differentiation rules, that . Thus, the result will be:

4 ApplyingForwardandBackwardPropagationina computational graph
Our goal is to apply both forward and backward propagation in the following graph:

4.1 Forward Propagation

We observe, that the final output of the graph is 44
4.2 Backward Propagation
In order to compute the results of the Backward Propagation method, we must first define the functions that we observe in each node of the graph. Starting from the lower layers, we have the following functions:
h = x + y i = x × y g = h + i = x + y + x × y f = 4 × g = 4 × g = 4 × (x + y) × (x × y)
We must now define the local derivatives of each node. We will do that with respect to their input
h :
i : g :
f :
We are going to compute all the gradients, based on the rule downstream = local ×upstream, and by starting at the output layer of the graph, who’s gradient is 1. Thus, for each node we have:
f : f : g : g : h : h :
i : i :
Thus, the final look of the graph with the back-propagation results(downstream gradients) is the following:

5 Sentiment Classifier using feed-forward NNs
Two different python notebooks were created, and can be found in the parent directory. I chose to construct models using 2 different types of features:
• TF-IDF features
• GloVe features, using the pre-trained vectors
Both of the notebooks are well-documented. In general, I observed that the model which used GloVe (along with an embedding layer), behaved better than the TF-IDF one. I compared all the models with F1, accuracy, precision and recall, and structured the appropriate plots.
Note: In the first notebook, i only used a 10% of the dataset to train the model, because colab’s RAM could not handle all of it.

More products