Here we will enlist the tasks that you need to complete, along with a brief description of each. Please also look at the comments provided in each function that you need to implement for further specification of the input and output formats and shapes.
The general structure of the codebase is as follows. There is a class called called FullyConnectedLayer which represents one fully connected linear layer followed by a non-linear activation function which could be a ReLU or softmax layer. The NeuralNetwork class consists of a series of FullConnectedLayers stacked one after the other, with the output of the last layer representing a probability distribution over the classes for the given input. For this reason, the activation of the last function should always be the softmax function. Both these files are defined in nn.py.
In main.py, there are 2 tasks - taskXor and taskMnist - corresponding to the 2 datasets. In this, you need to define neural networks by adding fully connected layers. The code for the XOR dataset trains the model and prints the test accuracy at the end, while the code for the MNIST dataset trains the model and then uses the trained model to make predictions on the test set. Note that the answers to the test set have not been provided for the MNIST dataset.
Task 1 You need to implement the following functions in the FullyConnectedLayer class:
a. __init__: Initialise the parameters (weights and biases) as needed. This is not graded, but necessary for the rest of the assignment
b. relu_of_X: Return ReLU(X) where X is the input
c. softmax_of_X: Return softmax(X) where X is the input. The output of this layer now represents a probability distribution over all the output classes
d. forwardpass: Compute the forward pass of a linear layer making use of the above 2 functions. You can store information that you compute that will be needed in the backward pass in the variable self.data
e. gradient_relu_of_X: Assume that the output and input are represented by X and Y, respectively such that Y = ReLU(X). This function should take as input dLoss/dY and return dLoss/dX.
f. gradient_softmax_of_X: Like gradient_relu_of_X, this function takes as input dLoss/dY and should return dLoss/dX. You should try to work the gradient out on paper first and then try to implement it in the most efficient way possible. A “for” loop over the batch is an acceptable implementation for this subtask. [Hint: An output element y_j is not dependent on only x_j, so you may need to use the Jacobian Matrix here.]
g. backwardpass: Implement the backward pass here, using the above 2 functions. This function should only compute the gradients and store them in the appropriate member variables (which will be checked by the autograder), and not update the parameters. The function should also return the gradient with respect to its input (dLoss/dX), taking the gradient with respect to its output (dLoss/dY) as an input parameter.
h. updateWeights: This function uses the learning rate and the stored gradients to make actual updates.
Task 2 The NeuralNetwork class already has a defined __init__ function as well as a function to add layers to the network. You need to understand these functions and implement the following functions in the class:
a. crossEntropyLoss: Computes the cross entropy loss using the one-hot encoding of the groundtruth label and the output of the model.
b. crossEntropyDelta: Computes the gradient of the loss with respect to the model predictions P
i.e. d[crossEntropy(P, Y)] / dP, where Y refers to the ground-truth labels.
c. train: This function should use the batch size, learning rate and number of epochs to implement the entire training loop. Make sure that the activation used in the last layer is the softmax function, so that the output of the model is a probability distribution over the classes. You can use the validation set to compute validation accuracy at different epochs (using the member functions of the NeuralNetwork class). Feel free to print different accuracies, losses and other statistics for debugging and hyperparameter tuning. It would, however, be preferable if you commented or deleted all the print commands in your final submission.
The train function will not be graded using the autograder. You will receive the marks for the datasets (Part B) only if you have satisfactorily implemented the train function.
There are also functions to make predictions and test accuracy that should not be modified.
Task 3 Finally, in main.py you need to define appropriate neural networks which give the best accuracy on both datasets. You also need to specify all the other hyperparameters like the batch size, learning rate and the number of epochs to train for. The relevant functions are: a. taskXor
b. taskMnist
c. preprocessMnist: Perform any preprocessing you wish to do on the data here. [Hint: Some minimal basic preprocessing is needed to train with stability]
Do not modify the code in the rest of the function since it will be used to generate your final predictions.
You are encouraged to make plots of the validation / training loss versus the number of epochs for different hyperparameters and note down any interesting or unusual observations. You can submit the same in a folder called “extra” in the main directory.
Tip: In case you are getting NaN as the loss value, ensure that if you are dividing by a variable that might be 0, add a small constant to it, i.e., 1/x - 1/(x + 1e-8)