The goal of this is to implement neural network to recognize hand-written digits in the MNIST data.
MNIST Data You will use the MNIST hand written digit dataset to perform the first task (neural network). We reduce the image size (28 × 28 → 14 × 14) and subsample the data. You can download the training and testing data from here: http://www.cs.umn.edu/~hspark/csci5561/ReducedMNIST.zip
Description: The zip file includes two MAT files (mnist_train.mat and mnist_test.mat). Each file includes im_* and label_* variables:
• im_* is a matrix (196 × n) storing vectorized image data (196 = 14 × 14)
• label_* is n × 1 vector storing the label for each image data.
n is the number of images. You can visualize the ith image, e.g., imshow(uint8(reshape(im_train(:,i), [14,14]))).
1 Single-layer Linear Perceptron
Figure 2: You will implement a single linear perceptron that produces accuracy near 30%. Random chance is 10% on testing data.
You will implement a single-layer linear perceptron (Figure 2(a)) with stochastic gradient descent method. We provide main_slp_linear where you will implement GetMiniBatch and TrainSLP_linear.
function [mini_batch_x, mini_batch_y] = GetMiniBatch(im_train, label_train, batch_size)
Input: im_train and label_train are a set of images and labels, and batch_size is the size of the mini-batch for stochastic gradient descent.
Output: mini_batch_x and mini_batch_y are cells that contain a set of batches (images and labels, respectively). Each batch of images is a matrix with size 194×batch_size, and each batch of labels is a matrix with size 10×batch_size (one-hot encoding). Note that the number of images in the last batch may be smaller than batch_size. Description: You may randomly permute the the order of images when building the batch, and whole sets of mini_batch_* must span all training data.
function y = FC(x, w, b)
Input: x∈Rm is the input to the fully connected layer, and w∈Rn×m and b∈Rn are the weights and bias.
Output: y∈Rn is the output of the linear transform (fully connected layer). Description: FC is a linear transform of x, i.e., y = wx + b.
function [dLdx dLdw dLdb] = FC_backward(dLdy, x, w, b, y)
Input: dLdy∈R1×n is the loss derivative with respect to the output y.
Output: dLdx∈R1×m is the loss derivative with respect the input x, dLdw∈R1×(n×m) is the loss derivative with respect to the weights, and dLdb∈R1×n is the loss derivative with respec to the bias.
Description: The partial derivatives w.r.t. input, weights, and bias will be computed. dLdx will be back-propagated, and dLdw and dLdb will be used to update the weights and bias.
function [L, dLdy] = Loss_euclidean(y_tilde, y)
Input: y_tilde∈Rm is the prediction, and y∈ 0,1m is the ground truth label.
Output: L∈R is the loss, and dLdy is the loss derivative with respect to the prediction. Description: Loss_euclidean measure Euclidean distance L = ky − yk2. e
function [w, b] = TrainSLP_linear(mini_batch_x, mini_batch_y)
Input: mini_batch_x and mini_batch_y are cells where each cell is a batch of images and labels.
Output: w∈R10×196 and b∈R10×1 are the trained weights and bias of a single-layer perceptron.
Description: You will use FC, FC_backward, and Loss_euclidean to train a singlelayer perceptron using a stochastic gradient descent method where a pseudo-code can be found below. Through training, you are expected to see reduction of loss as shown in Figure 2(b). As a result of training, the network should produce more than 25% of accuracy on the testing data (Figure 2(c)).
Algorithm 1 Stochastic Gradient Descent based Training
1: Set the learning rate γ
2: Set the decay rate λ ∈ (0,1]
3: Initialize the weights with a Gaussian noise w ∈N(0,1)
4: k = 1
5: for iIter = 1 : nIters do
6: At every 1000th iteration, γ ← λγ
7: 0 and
8: for Each image xi in kth mini-batch do
9: Label prediction of xi
10: Loss computation l
11: Gradient back-propagation of using back-propagation.
12: w w and
13: end for
14: k++ (Set k = 1 if k is greater than the number of mini-batches.)
15:
2 Single-layer Perceptron
Figure 3: You will implement a single perceptron that produces accuracy near 90% on testing data.
You will implement a single-layer perceptron with soft-max cross-entropy using stochastic gradient descent method. We provide main_slp where you will implement TrainSLP. Unlike the single-layer linear perceptron, it has a soft-max layer that approximates a max function by clamping the output to [0,1] range as shown in Figure 3(a).
function [L, dLdy] = Loss_cross_entropy_softmax(x, y)
Input: x∈Rm is the input to the soft-max, and y∈ 0,1m is the ground truth label.
Output: L∈R is the loss, and dLdy is the loss derivative with respect to x.
Description: Loss_cross_entropy_softmax measure cross-entropy between two distributions where yei is the soft-max output that approximates the max operation by clamping x to [0,1] range:
,
where xi is the ith element of x.
function [w, b] = TrainSLP(mini_batch_x, mini_batch_y)
Output: w∈R10×196 and b∈R10×1 are the trained weights and bias of a single-layer perceptron.
Description: You will use the following functions to train a single-layer perceptron using a stochastic gradient descent method: FC, FC_backward, Loss_cross_entropy_softmax
Through training, you are expected to see reduction of loss as shown in Figure 3(b). As a result of training, the network should produce more than 85% of accuracy on the testing data (Figure 3(c)).
3 Multi-layer Perceptron
Figure 4: You will implement a multi-layer perceptron that produces accuracy more than 90% on testing data.
You will implement a multi-layer perceptron with a single hidden layer using a stochastic gradient descent method. We provide main_mlp. The hidden layer is composed of 30 units as shown in Figure 4(a).
function [y] = ReLu(x)
Input: x is a general tensor, matrix, and vector.
Output: y is the output of the Rectified Linear Unit (ReLu) with the same input size. Description: ReLu is an activation unit (yi = max(0,xi)). In some case, it is possible to use a Leaky ReLu (y ) where
function [dLdx] = ReLu_backward(dLdy, x, y)
Input: dLdy∈R1×z is the loss derivative with respect to the output y ∈Rz where z is the size of input (it can be tensor, matrix, and vector).
Output: dLdx∈R1×z is the loss derivative with respect to the input x.
function [w1, b1, w2, b2] = TrainMLP(mini_batch_x, mini_batch_y)
Output: w1 ∈R30×196, b1 ∈R30×1, w2 ∈R10×30, b2 ∈R10×1 are the trained weights and biases of a multi-layer perceptron.
Description: You will use the following functions to train a multi-layer perceptron using a stochastic gradient descent method: FC, FC_backward, ReLu, ReLu_backward, Loss_cross_entropy_softmax. As a result of training, the network should produce more than 90% of accuracy on the testing data (Figure 4(b)).
4 Convolutional Neural Network
Input Conv (3) ReLu Pool (2x2) Flatten FC Soft-max Accuracy: 0.947251
(a) CNN (b) Confusion
Figure 5: You will implement a convolutional neural network that produces accuracy more than 92% on testing data.
You will implement a convolutional neural network (CNN) using a stochastic gradient descent method. We provide main_cnn. As shown in Figure 4(a), the network is composed of: a single channel input (14×14×1) → Conv layer (3×3 convolution with 3 channel output and stride 1) → ReLu layer → Max-pooling layer (2 × 2 with stride 2) → Flattening layer (147 units) → FC layer (10 units) → Soft-max. function [y] = Conv(x, w_conv, b_conv)
Input: x∈RH×W×C1 is an input to the convolutional operation, w_conv∈RH×W×C1×C2 and b_conv∈RC2 are weights and bias of the convolutional operation.
Output: y∈RH×W×C2 is the output of the convolutional operation. Note that to get the same size with the input, you may pad zero at the boundary of the input image. Description: This convolutional operation can be simplified using MATLAB built-in function im2col.
function [dLdw, dLdb] = Conv_backward(dLdy, x, w_conv, b_conv, y) Input: dLdy is the loss derivative with respec to y.
Output: dLdw and dLdb are the loss derivatives with respect to convolutional weights and bias w and b, respectively.
Description: This convolutional operation can be simplified using MATLAB built-in function im2col. Note that for the single convolutional layer, is not needed.
function [y] = Pool2x2(x)
Input: x∈RH×W×C is a general tensor and matrix.
Output: y∈RH2 ×W2 ×C is the output of the 2×2 max-pooling operation with stride 2. function [dLdx] = Pool2x2_backward(dLdy, x, y) Input: dLdy is the loss derivative with respect to the output y. Output: dLdx is the loss derivative with respect to the input x.
function [y] = Flattening(x) Input: x∈RH×W×C is a tensor.
Output: y∈RHWC is the vectorized tensor (column major).
function [dLdx] = Flattening_backward(dLdy, x, y) Input: dLdy is the loss derivative with respect to the output y. Output: dLdx is the loss derivative with respect to the input x.
function [w_conv, b_conv, w_fc, b_fc] = TrainCNN(mini_batch_x, mini_batch_y) Output: w_conv ∈ R3×3×1×3, b_conv ∈ R3, w_fc ∈ R10×147, b_fc ∈ R147 are the trained weights and biases of the CNN.
Description: You will use the following functions to train a convolutional neural network using a stochastic gradient descent method: Conv, Conv_backward, Pool2x2,
Pool2x2_backward, Flattening, Flattening_backward, FC, FC_backward, ReLu, ReLu_backward, Loss_cross_entropy_softmax. As a result of training, the network should produce more than 92% of accuracy on the testing data (Figure 5(b)).