Starting from:

$30

Machine Learning 2-Exercise Sheet 10 Solved

Exercise 1: Mixture Density Network
In this exercise, we prove some of the results from the paper Mixture Density Networks by Bishop (1994). The mixture density network is given by

 

with the mixture elements

 .

The contribution to the error function of one data point q is given by

m

Eq = −lognXαi(xq)φi(tq|xq)o

i=1 We also define the posterior distribution

 

which is obtained using the Bayes theorem.

(a)     Compute the gradient of the error Eq w.r.t. the mixture parameters, i.e. show that

 

(b)    We now assume that the neural network produces the mixture coefficients as:

 

where zα denotes the outputs of the neural network (after the last linear layer) associated to these mixture coefficients. Compute using the chain rule for derivatives (i.e. by reusing some of the results in the first part of this exercise) the derivative ∂Eq/∂ziα.

Exercise 2: Conditional RBM 
The conditional restricted Boltzmann machine is a system of binary variable comprising inputs x ∈ {0,1}d, outputs y ∈ {0,1}c, and hidden units h ∈ {0,1}K. It associates to each configuration of these binary variables the energy:

E(x,y,h) = −x>Wh − y>Uh

and the probability associated to each configuration is then given as:

 

where Z is a normalization constant that makes probabilities sum to one.

(a)     Let sigm(t) = exp(t)/(1 + exp(t)) be the sigmoid function. Show that

(i)      p(hk = 1|x,y) = sigm x 

(ii)    p(yj = 1|h,x) = sigm  

(b)    Show that

 

where

 

is the free energy and where Z is again a normalization constant.

Exercise 3: Programming 
Download the programming files on ISIS and follow the instructions.

Exercise sheet 10 (programming)                                                                                      

 

MNIST Inpainting with Energy-Based Learning
In this exercise, we consider the task of inpainting incomplete handwritten digits, and for this, we would like to make use of neural networks and the Energy-Based Learning framework.

 

As a first step, we load the MNIST dataset

 

We consider the following perturbation process that draws some region near the center of the image randomly and set the pixels in this area to some gray value.

 

 PCA Reconstruction 
A simple technique for impainting an image is principal component analysis. It consists of taking the incomplete image and projecting it on the d principal components of the training data.

Task:

  Implement a function that takes a collection of test examples z and projects them on the d principal components of the training data x .

 

The PCA-based inpainting technique is tested below on 10 test points for which a patch is missing. We observe that the patch-like perturbation is less severe when d is low, but the reconstructed part of the digit appears blurry. Conversely, if setting d high, more details become available, but the missing pattern appears more prominent.

 

Energy-Based Learning 
We now consider the energy-based learning framework where we learn an energy function to discriminate between correct and incorrect reconstructions.

 

To be able to generate good contrastive examples (i.e. incorrect reconstructions that are still plausible enough to confuse the energybased model and for which meaningful gradient signal can be extracted), we consider a generator network that takes as input the incomplete images.

 

The whole architecture is depicted in the diagram below:

 

The two networks are then jointly optimized. The structure of the optimization problem is already provided to you, however, the code that computes the forward pass from the input data up to the error function are missing.

Task:

  Write the code that computes the error function. Here, we use a single optimizer and must therefore implement the gradient flip trick described in the slides. A similar trick can be used to only let the gradient flow into the generator only via the missing image patch and not through all pixels.

 

/home/gregoire/.local/lib/python3.8/site-packages/torch/autograd/__init__.py:130: Use rWarning: CUDA initialization: The NVIDIA driver on your system is too old (found ver sion 10010). Please update your GPU driver by downloading and installing a new versio n from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: http s://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:10 0.)

  Variable._execution_engine.run_backward(

0 tensor(0.6744, grad_fn=<MeanBackward0>)

10 tensor(0.1035, grad_fn=<MeanBackward0>)

20 tensor(0.2146, grad_fn=<MeanBackward0>)

30 tensor(0.3614, grad_fn=<MeanBackward0>)

40 tensor(0.3134, grad_fn=<MeanBackward0>)

50 tensor(0.3515, grad_fn=<MeanBackward0>)

60 tensor(0.4389, grad_fn=<MeanBackward0>)

70 tensor(0.3787, grad_fn=<MeanBackward0>)

80 tensor(0.4541, grad_fn=<MeanBackward0>)

90 tensor(0.4365, grad_fn=<MeanBackward0>)

After optimizing for a sufficient number of epochs, the solution has ideally come close to some nash equilibrium where both the generator and energy-based model perform well. In particular, the generator should generate examples that look similar to the true examples. The code below plots the incomplete digits and the reconstruction obtained by the generator network.

 

As we can see, although some artefacts still persist, the reconstructions are quite plausible and look better than those one gets with the simple PCA-based approach. Note however that the procedure is also more complex and computationally more demanding than a simple PCA-based reconstruction.

More products