$30
To implement multi-layer neural network using the back propagation algorithm to classify linearly non-separable data.
Background
Multi-layer neural network (MNN) implements the linear discriminant functions however in a feature space where the input patterns are mapped non-linearly. Neural networks are quite powerful and easily realizable using simple algorithms where the form of non-linearity can be learned from the training data. One of the most popular methods for training the MNN is based on the gradient descent in error commonly known as backpropagation algorithm. Figure 1 shows an example of a 3 layer neural network.
The task is to estimate the weight vectors (wji and wkj) between the input-hidden layer and the hidden-output layer for a given activation function. The pseudo code (11 steps) of the batch backpropogation algorithm is outlined on the following page.
Where: The sensitivity of unit k is given by
δk = (tk − zk) · f0(netk)
and the sensitivity for a hidden unit is given by
(1)
(2)
The (netj) and (netk) are the net activation and f(netj) and f(netk) are the non-linear activation functions. The derivative of the function f(·) is shown as f(·).
Figure 1: An d − nH − c fully connected 3 layer NN (Bias not shown)
Back Propagation Algorithm:
begin initialize network topology(# hidden units), w, criterion θ, η, r ← 0
do r ← r + 1 (increment epoch)
m ← 0; ∆wij ← 0; ∆wjk ← 0
do m ← m + 1
xm ← select pattern
∆wij ← ∆wij + ηδjxi; ∆wjk ← ∆wjk + ηδk yi
until m = n
wij ← wij + ∆wij; wjk ← wjk + ∆wjk
until k∇J(w)k < θ
return w
end
Laboratory Exercises
Construct a 2-2-1 neural network using the batch backpropogation algorithm for solving the classical XOR problem. The two inputs are given as x1 = [ −1 −1 1 1 ] and x2 = [ −1 1 −1 1 ]. The targets (correct output) are t = [ −1 1 1 −1 ]. Use a η = 0.1 and threshold θ = 0. Assume the following sigmoid activation function (for both hidden and output units)
f(x) = atanh(bx) (3)
where a = b = 1.
Verify that the computed final weight vectors satisfy the XOR operation. Plot the learning curve and note the number of epochs needed for convergence.
Repeat the above for the ’Wine’ data set from the UCI repository:
using test samples from Class 1 (ω1) and Class 3 (ω2) - using only features x1 = Alcohol and x2 = MalicAcid. Use class target labels (ω1 = 1, ω2 = −1).
Compute the classification accuracy for the given data. Plot the learning curve and note the number of epochs needed for convergence