Starting from:

$30

EEE443-Assignment 1 Solved

Question 1.
A single neuron receives input from m input neurons with weights wi, where i ∈ [1 m].

The neuron is expected to predict the probability that the output t belongs to Class A (t = 1) versus Class B (t = −1). A datasets of training samples are available with inputs xn and outputs yn (n ∈ [1 N]). You are told that the maximum a posteriori estimate for the network weights are obtained by solving the following optimization problem:

                                                                argminX(yn − h(xn,W))2 + β Xwi2                                                (1)

W

                                                                          n                                                                      i

where W is the vector of weights wi, β is a scalar constant, and h(.) is the output of the neuron. According to this estimate, derive the prior probability distribution of the network weights analytically.


Question 2. [
An engineer would like to design a neural network with a single hidden layer with four input neurons (with binary inputs) and a single output neuron to implement:

(X1 OR NOT X2) XOR (NOT X3 OR NOT X4)

Assume a hidden layer with four hidden units, and a unipolar activation function (i.e., the step function). Answer the questions below.

a)              For each hidden unit, analyically derive the set of inequalities based on which a set of weights and an activation threshold can be selected.

b)              Choose a particular weight vector (including the bias term), and show that the designed network achieves 100% performance in implementing the desired logic.

c)              Now assume that the input data samples are subject to small random fluctuations due to noise. Will the network you designed in part a function robustly under noisy conditions? Find the set of weights and the activation threshold for the most robust decision boundary.

d)              Generate 100 input samples by first concatenating 25 samples from each input vector. Generate a random noise vector of length 2 for each training sample, assuming a zeromean Gaussian distribution with an std of 0.2. Form validation samples for testing the NNs by linearly superposing the input samples and the random noise samples. Evaluate the classification performance (i.e., percentage correct) of the networks designed in parts a and c on the validation samples. Interpret your results.

Question 3.
A researcher would like to process images of alphabet letters with a perceptron. A collection of images were compiled for training and testing the perceptron. The file assign1_data1.mat contains variables trainims (training images) and testims (testing images) along with the ground truth labels in trainlbls and testlbls. Answer the questions below.

a)              Visualize a sample image for each class. Find correlation coefficients between pairs of sample images that you have selected. Display the correlations in matrix format. Discuss the degree of within-class versus across-class variability.

b)              Design a single-layer perceptron with an output neuron for each digit, using the training data. Set the initial network weights w and bias term b as random numbers drawn from a Gaussian distribution N(0,0.01), assume a sigmoid activation function. Your implementation should not train each output neuron separately, but a compound matrix W and a compound vecor b should be defined and used to simultaneously update all connections. The online training algorithm should perform 10000 iterations. At each iteration, a sample image should be randomly selected from the training data, the network should be updated according to the gradient-descent learning rule, and W, b, and the mean-squared error (MSE) should be recorded. Tune the learning rate η∗ in order to minimize the final value of the MSE. Display the final network weights for each digit as a separate image, and describe the visual characteristics.

c)              Now separately repeat the training process using a substantially higher and a subtantially lower value thant η∗. On a single figure, plot the MSE curves (across all 10000 iterations) for ηhigh, ηlow and η∗. Discuss your results.

d)              Validate the performance of the trained networks using all samples in the test data. Report the performance values for the three networks with ηhigh, ηlow and η∗.

More products