Starting from:

$30

Machine Learning 2-Exercise Sheet 12 Solved

Exercise 1: Deep SVDD 
Consider a dataset x1,...,xN ∈ Rd, and a simple linear feature map φ(x) = w>x + b with trainable parameters w and b. For this simple scenario, we can formulate the deep SVDD problem as:

N

  2

where we have hardcoded the center parameter of deep SVDD to 1. We then classify new points x to be anomalous if kw>x + b − 1k2 > τ.

(a)     Give a choice of parameters (w,b) that minimizes the objective above for any dataset (x1,...,xN).

(b)    We now consider a regularizer for our feature map φ which simply consists of forcing the bias term to b = 0. Show that under this regularizer, the solution of deep SVDD is given by:

w = Σ−1x¯

where x¯ and Σ are the empirical mean and uncentered covariance.

Exercise 2: Restricted Boltzmann Machine 
The restricted Boltzmann machine is a system of binary variables comprising inputs x ∈ {0,1}d and hidden units h ∈ {0,1}K. It associates to each configuration of these binary variables the energy:

E(x,h) = −x>Wh − b>h

and the probability associated to each configuration is then given as:

 

where Z is a normalization constant that makes probabilities sum to one. Let sigm(t) = exp(t)/(1+exp(t)) be the sigmoid function.

(a)    Show that p(hk = 1|x) = sigm x .

(b)    Show that p(xj = 1|h) = sigm  .

(c)    Show that

 

where

 

is the free energy and where Z is again a normalization constant.

Exercise 3: Programming 
Download the programming files on ISIS and follow the instructions.

Exercise sheet 12 (programming)                                                                                      

 

KDE and RBM for Anomaly Detection
In this programming exercise, we compare in the context of anomaly detection two energy-based models: kernel density estimation (KDE) and the restricted Boltzmann machine (RBM).

 

We consider the MNIST dataset and define the class "0" to be normal (inlier) and the remain classes (1-9) to be anomalous (outlier). We consider that we have a training set Xr composed of 100 normal data points. The variables Xi and Xo denote normal and anomalous test data.

 

 

Kernel Density Estimation 
We first consider kernel density estimation which is a shallow model for anomaly detection. The code below implement kernel density estimation.

Task:

  Implement the function energy that returns the energy of the points X given as input as computed by the KDE energy function (cf. slide Kernel Density Estimation as an EBM).

 

The following code applies KDE with different scale parameters gamma and returns the performance of the resulting anomaly detection model measured in terms of area under the ROC.

 

gamma = 0.028  AUROC = 0.969 gamma = 0.046  AUROC = 0.976 gamma = 0.077  AUROC = 0.981 gamma = 0.129  AUROC = 0.983 gamma = 0.215  AUROC = 0.983 gamma = 0.359  AUROC = 0.982 gamma = 0.599  AUROC = 0.982 gamma = 1.000  AUROC = 0.981

We observe that the best performance is obtained for some intermediate value of the parameter gamma .

Restricted Boltzmann Machine 
We now consider a restricted Boltzmann machine composed of 100 binary hidden units (h ∈ {0,1}100). The joint energy function of our RBM is given by:

E(x,h) = −x⊤a − x⊤Wh − h⊤b
The model can be marginalized over its hidden units and the energy function that depends only on the input x is then given as:

100

E(x) = −x⊤a − ∑log(1 + exp(x⊤W:,k + bk))

k=1 The RBM training algorithm is already implemented for you.

Tasks:

  Implement the energy function E(x)

Augment the function fit with code that prints the AUROC every 100 iterations.


In [6]: def sigm(t): return numpy.tanh(0.5*t)*0.5+0.5 def realize(t): return 1.0*(t>numpy.random.uniform(0,1,t.shape))

class RBM(AnomalyModel):

    def __init__(self,X,h): self.mb = X.shape[0] self.d = X.shape[1] self.h = h self.lr = 0.1

        

 # Model parameters self.A = numpy.zeros([self.d]) self.W = numpy.random.normal(0,self.d**-.25 * self.h**-.25,[self. d,self.h]) self.B = numpy.zeros([self.h])

        def fit(self,X,verbose=False):

        

Xm = numpy.zeros([self.mb,self.d])

        for i in numpy.arange(1001):

            

# Gibbs sampling (PCD)

Xd = X*1.0

Zd = realize(sigm(Xd.dot(self.W)+self.B))

Zm = realize(sigm(Xm.dot(self.W)+self.B))

Xm = realize(sigm(Zm.dot(self.W.T)+self.A))

            

# Update parameters

self.W += self.lr*((Xd.T.dot(Zd) - Xm.T.dot(Zm)) / self.mb -

0.01*self.W) self.B += self.lr*(Zd.mean(axis=0)-Zm.mean(axis=0)) self.A += self.lr*(Xd.mean(axis=0)-Xm.mean(axis=0))

            if verbose:

# ------------------------------------------------

# TODO: Replace by your code

# -----------------------------------------------import solution

solution.track_auroc(self,i)

# ------------------------------------------------

            

        def energy(self,X):

                      # ------------------------------------------------

# TODO: Replace by your code

# -----------------------------------------------import solution

E = solution.rbm_energy(self,X)

# ------------------------------------------------

                      return E

We now train our RBM on the same data as the KDE model for approximately 1000 iterations.

 

it =     0  AUROC = 0.962 it =   100  AUROC = 0.943 it =   200  AUROC = 0.985 it =   300  AUROC = 0.987 it =   400  AUROC = 0.988 it =   500  AUROC = 0.986 it =   600  AUROC = 0.987 it =   700  AUROC = 0.987 it =   800  AUROC = 0.989 it =   900  AUROC = 0.986 it =  1000  AUROC = 0.990

We observe that the RBM reaches superior levels of AUROC performance compared to the simple KDE model. An advantage of the RBM model is that it learns a set of parameters that represent variations at multiple scales and with specific orientations in input space. We would like to visualize these parameters:

Task:

  Render as a mosaic the weight parameters ( W ) of the model. Each tile of the mosaic should correspond to the receptive field connecting the input image to a particular hidden unit.

More products