Starting from:

$25

IFT6135-Assignment1 Multilayer Perceptrons and Convolutional Neural networks Solved

Question 1 (4-4-4). Using the following definition of the derivative and the definition of the Heaviside step function :

 1         if x 0

  

if x = 0 0 if x < 0

1.    Show that the derivative of the rectified linear unit g(x) = max{0,x}, wherever it exists, is equal to the Heaviside step function.

2.    Give two alternative definitions of g(x) using H(x).
 

Show that H(x) can be well approximated by the sigmoid function  asymptotically (i.e for large k), where k is a parameter.

 

Question 2 (3-3-3-3). Recall the definition of the softmax function : S(x)i = exi/Pj exj.

1.    Show that softmax is translation-invariant, that is : S(x+c) = S(x), where c is a scalar constant.

2.    Show that softmax is not invariant under scalar multiplication. Let Sc(x) = S(cx) where c ≥ 0. What are the effects of taking c to be 0 and arbitrarily large?

3.    Let x be a 2-dimensional vector. One can represent a 2-class categorical probability using softmax S(x). Show that S(x) can be reparameterized using sigmoid function, i.e. S(x) = [σ(z),1−σ(z)] where z is a scalar function of x.

4.    Let x be a K-dimensional vector (K ≥ 2). Show that S(x) can be represented using K − 1 parameters, i.e. S(x) = S([0,y1,y2,...,yK−1]) where yi is a scalar function of x for i ∈ {1,...,K− 1}.

 

Question 3 (16). Consider a 2-layer neural network y : RD → RK of the form :


for 1 ≤ k ≤ K, with parameters Θ = (ω(1),ω(2)) and logistic sigmoid activation function σ. Show that there exists an equivalent network of the same form, with parameters Θ0 = (ω˜(1),ω˜(2)) and tanh activation function, such that  for all x ∈ RD, and express Θ0 as a function of 

Question 4 (5-5). Fundamentally, back-propagation is just a special case of reverse-mode Automatic Differentiation (AD), applied to a neural network. Based on the “three-part” notation shown in Table 1 and 4, represent the evaluation trace and derivative (adjoint) trace of the following examples. In the last columns of your solution, numerically evaluate the value up to 4 decimal places.

1.    Forward AD, with y = f(x1,x2) = 1/(x1 + x2) + x22 + cos(x1) at (x1,x2) = (3,6) and setting x˙1 = 1 to compute ∂y/∂x1.

Reverse AD, with y = f(x1,x2) = 1/(x1 + x2) + x22 + cos(x1) at (x1,x2) = (3,6). Setting y¯ = 1, ∂y/∂x1 and ∂y/∂x2 can be computed together

Question 5 (6). Compute the full, valid, and same convolution (with kernel flipping) for the following 1D matrices :  

 

Question 6 (5-5). Consider a convolutional neural network. Assume the input is a colorful image of size 256 × 256 in the RGB representation. The first layer convolves 64 8 × 8 kernels with the input, using a stride of 2 and no padding. The second layer downsamples the output of the first layer with a 5 × 5 non-overlapping max pooling. The third layer convolves 128 4 × 4 kernels with a stride of 1 and a zero-padding of size 1 on each border.

1.    What is the dimensionality (scalar) of the output of the last layer?

2.    Not including the biases, how many parameters are needed for the last layer?

 

 

Question 7 (4-4-6). Assume we are given data of size 3 × 64 × 64. In what follows, provide a correct configuration of a convolutional neural network layer that satisfies the specified assumption. Answer with the window size of kernel (k), stride (s), padding (p), and dilation (d, with convention d = 1 for no dilation). Use square windows only (e.g. same k for both width and height).

1.    The output shape (o) of the first layer is (64,32,32).

(a)   Assume k = 8 without dilation.

(b)  Assume d = 7, and s = 2.

2.    The output shape of the second layer is (64,8,8). Assume p = 0 and d = 1.

(a)   Specify k and s for pooling with non-overlapping window.

(b)  What is output shape if k = 8 and s = 4 instead?

3.    The output shape of the last layer is (128,4,4).

(a)   Assume we are not using padding or dilation.

(b)  Assume d = 2, p = 2. (c) Assume p = 1, d = 1.

 

 

 

 

 

 

More products