$40
This programming assignment has three main learning goals. First, the assignment will provide you with an opportunity to practice developing programs in Python using the PyCharm integrated development environment (IDE). Second, this assignment will give you some experience building artificial neural networks using the PyTorch system. Third, by encouraging you to examine the learning performance of such networks, this assignment will help you develop insights concerning the strengths and weaknesses of various network architectures.
You will be provided with Python source code that (incompletely) implements three artificial neural network models and trains them to solve a classic classification task. You are charged with completing this implementation by writing Python code that describes three network architectures: a linear network that directly connects inputs to outputs, a network with a single hidden layer, and a network with two hidden layers.
.
Activities
Overview
The PyTorch system is currently one of the leading software tools for making artificial intelligence systems using deep artificial neural networks. It provides fairly abstract methods for building such machine learning models, training them on data, and applying the trained models to novel inputs. This assignment provides you with some introductory experience using PyTorch software.
When complete, this assignment results in a Python program that constructs three different network architectures:
• AnnLinear — A network with no hidden layer, connecting a 4-element input vector to a 3-element linear output layer.
• AnnOneHid — A network with a single hidden layer of 20 processing units, mediating the production of a 3-element output vector from a 4-element input vector.
• AnnTwoHid — A network with two successive hidden layers between a 4-element input and a 3-element output, with the first hidden layer having 16 processing units and the second having 12 processing units.
The program trains each constructed network on a simple classification task, reporting progress during training and calculating the accuracy of each learned model on a set of testing examples.
The classification task is one that has been studied for many years. The goal is to learn to identify the species of an iris plant based on the size of various parts of the plant. Each plant is a member of one of three possible species: setosa, versicolor, or virginica. Four numerical features of the plant are input to allow the system to categorize it by species. The four input measurements are sepal length and width and petal length and width. A total of 150 examples are provided for use as training data and testing data. Details concerning the data set may be found at the UC Irvine Machine Learning Repository:
https://archive.ics.uci.edu/ml/datasets/iris
This is a fairly simple classification task to learn, making it easy to produce models with high categorization accuracy, even on previously unseen inputs.
Python Packages
This program makes use of three publicly available Python packages, all of which need to be installed in the PyCharm IDE before the program can be successfully run.
• PyTorch (torch) — tools for building artificial neural network models
• Pandas (pandas) — tools used for data loading and manipulation
• Scikit-learn (scikit-learn) — tools used to randomly sample training and testing sets from a collection of examples
These packages can be easily installed in PyCharm. On the Preferences window (or the Settings window), there is an option on the project pane which is labeled “Python Interpreter”. Selecting this option shows a table of installed packages. The plus-sign button may be selected to install a new package. The installation window contains a search bar, through which the three needed packages may be found. Once found, a package may be installed from this window by selecting the “Install Package” button.
Program Structure
The program is implemented in three Python script files. One of these files is incomplete.
• “main.py” — This file provides code to train and test all three network architectures, printing performance results.
• “ann.py” — This file provides three utility functions.
– loadirisdata — Load the iris classification data set from the UC Irvine Machine Learning Repository.
– trainmodel — Train a given network model using a specified set of examples.
– testmodel — Calculate the accuracy of a given network model on a specified set of testing examples.
• “arch.py” — This file defines three Python classes describing the three artificial neural network architectures of interest.
Of these three files, only the third, “arch.py”, is incomplete. A template is provided, but a solution to this assignment must provide the code necessary to implement the three architectures.
Architecture Specification
In PyTorch, an artificial neural network architecture is specified by defining a new Python class that inherits from nn.Module. The “arch.py” file defines three such classes: AnnLinear, AnnOneHid, and AnnTwoHid. Each class has an initialization method, called when constructing an instance, and a method that performs forward pass computations to produce network outputs.
In the initialization method, network layers are defined. Each layer should be stored in a class variable. For this assignment, all layers calculate their “net input” values as a linear weighted sum of inputs, with the weights being learned parameters. In PyTorch, such a layer may be declared using nn.Linear in the following way:
self.my_layer = nn.Linear(in_features=6, out_features=9)
In this example line of Python code, a layer with 9 processing units is defined, with the layer receiving 6-dimensional inputs (e.g., the number of processing units in the preceding layer). The layer is stored in the “my layer” variable.
In the method that implements the forward pass, aptly named “forward”, processing element activation levels are successively calculated across layers, from inputs to outputs. The method is given network inputs as an argument, conventionally labeled “x”. Weighted sums, calculating “net input” values, can be computed by treating layer variables as functions. For example, using “my layer” as previously defined, the “net input” of the layer can be computed using this line of Python code:
my_layer_net = self.my_layer(x)
An activation function can be applied to the “net input” of a layer. For example, the rectified linear (ReLU) activation function may be applied as follows:
my_layer_act = F.relu(self.my_layer(x))
In this way, the activation of each layer can be calculated, from inputs to outputs.
Note that a few variable naming conventions are repeatedly used. Typically, “x” references an input to the network. The target outputs of the network are referenced as “y”. The actual outputs of the network, to be compared with the targets, are referenced as “y hat” (spelling out yˆ). Thus, the forward method takes x as an argument and should return the corresponding value of y hat.
In this assignment, all three network architectures must take a 4-dimensional input, corresponding to the four measurements of the iris plants. Each architecture must provide a 3-dimensional output, with one element for each of the three species categories. The output element with the highest activation is taken as indicating the predicted species corresponding to the current network input. The network architecture with a single hidden layer should use 20 hidden units. The network architecture with two hidden layers should have 16 units in the first hidden layer and 12 in the second. All hidden layers should use the ReLU activation function, but network output layers should not. Output layers should compute the linear weighted sum of their inputs, without any nonlinear activation function.
Analysis
The “arch.py” file is to be completed, filling in code for the two methods in each of the three architecture classes. The header comment of this file should also contain answers to two analysis questions.
Once the artificial neural network architectures are implemented, running the program will display measures of training progress and testing set accuracy for each of the three architectures. The program should be run a dozen times or more, recording the displayed results. Because testing examples are sampled at random on each run, and each network begins with random initial weights, each run can produce different results. Still, patterns in performance should be evident. Based on your observations of many executions of the program, you should answer the following two questions:
1. Which network architecture achieves the lowest training set error?
2. Which network architecture tends to exhibit the best testing set accuracy?
With a little thought, you should be able to understand why the observed patterns of performance arose.