Starting from:

$30

STATSM232A-Project 5 Generator and Descriptor Solved

1         Generator: real inference
The model has the following form:

                                                                                                                                       (1)

                                                       N(0,σ2ID), d < D.                                               (2)

f(Z;W) maps latent factors into image Y , where W collects all the connection weights and bias terms of the ConvNet.

Adopting the language of the EM algorithm, the complete data model is given by

                                  logp(Y,Z;W) = log[p(Z)p(Y |Z,W)]                                                                      (3)

                                                             + const.                        (4)

The observed-data model is obtained by intergrating out Z: p(Y ;W) = R p(Z)p(Y |Z,W)dZ. The posterior distribution of Z is given by p(Z|Y,W) = p(Y,Z;W)/p(Y ;W) ∝ p(Z)p(Y |Z,W) as a function of Z.

We want to minimize the observed-data log-likelihood, which is 

 . The gradient of L(W) can be calculated according to the fol-

lowing well-known fact that underlies the EM algorithm:

                                        ;W)dZ                                 (5)

                                                                         .                                (6)

The expectation with respect to p(Z|Y,W) can be approximated by drawing samples from p(Z|Y,W) and then compute the Monte Carlo average.

The Langevin dynamics for sampling Z ∼ p(Z|Y,W) iterates

                       ,                        (7)

where τ denotes the time step for the Langevin sampling, δ is the step size, and Uτ denotes a random vector that follows N(0,Id).

The stochastic gradient algorithm can be used for learning, where in each iteration, for each Zi, only a single copy of Zi is sampled from p(Zi|Yi,W) by running a finite number of steps of Langevin dynamics starting from the current value of Zi, i.e., the warm start. With {Zi} sampled in this manner, we can update the parameter W based on the gradient L0(W), whose Monte Carlo approximation is:

                                                                                                           )                                                            (8)

(9)

                                                                                                                                   .                                 (10)

Algorithm 1 describes the details of the learning and sampling algorithm.

 Algorithm 1 Generator: real inference Input:

(1)  training examples {Yi,i = 1,...,n},

(2)  number of Langevin steps l, (3) number of learning iterations T.

Output:

(1)  learned parameters W,

(2)  inferred latent factors {Zi,i = 1,...,n}.

1: Let t ← 0, initialize W.

2: Initialize Zi, for i = 1,...,n.

3: repeat

4: Inference step: For each i, run l steps of of Langevin dynamics to sample Zi ∼ p(Zi|Yi,W) with warm start, i.e., starting from the current Zi, each step follows equation 7.

5: Learning step: Update W ← W +γtL0(W), where L0(W) is computed according to equation 10, with learning rate γt.

6:               Let t ← t + 1.

7: until t = T

 

1.1        TO DO
For the lion-tiger category, learn a model with 2-dim latent factor vector. Fill the blank part of ./GenNet/GenNet.py. Show:

(1)                  Reconstructed images of training images, using the inferred z from training images.

(2)                  Randomly generated images, using randomly sampled z.

(3)                  Generated images with linearly interpolated latent factors from (−2,2) to (−2,2). For example, you inperlolate 8 points from (−2,2) for each dimension of z. Then you will get a 8 × 8 panel of images. You should be able to seee that tigers slight change to lion.

(4)                  Plot of loss over iteration.

2         Descriptor: real sampling
The descriptor model is as follows:

                                                    ,                                                   (11)

where p0(Y ) is the reference distribution such as Gaussian white noise

                                                                                                                  (12)

The scoring function fθ(Y ) is defined by a bottom-up ConvNet whose parameters are denoted by θ. The normalizing constant Z(θ) = R exp[fθ(Y )]p0(Y )dY is analytically intractable. The energy function is

                                                         .                                                        (13)

pθ(Y ) is an exponential tilting of p0.

Suppose we observe training examples {Yi,i = 1,...,n} from an unknown data distribution Pdata(Y ). The maximum likelihood learning seeks to maximize the log-likelihood function

                                                         .                                                         (14)

If the sample size n is large, the maximum likelihood estimator minimizes the KullbackLeibler divergence KL(Pdatakpθ) from the data distribution Pdata to the model distribution pθ. The gradient of L(θ) is

                                          ,                                       (15)

where Eθ denotes the expectation with respect to pθ(Y ). The key to the above identity is that ∂θ∂ logZ(θ) = Eθ[∂θ∂ fθ(Y )].

The expectation in equation (15) is analytically intractable and has to be approximated by MCMC, such as Langevin dynamics, which iterates the following step:

                                            ,                                   (16)

where τ indexes the time steps of the Langevin dynamics, δ is the step size, and Uτ ∼ N(0,I) is Gaussian white noise. The Langevin dynamics relaxes Yτ to a low energy region, while the noise term provides randomness and variability. A Metropolis-Hastings step may be added to correct for the finite step size δ. We can also use Hamiltonian Monte Carlo for sampling the generative ConvNet.

We can run ˜n parallel chains of Langevin dynamics according to (16) to obtain the synthesized examples {Y˜i,i = 1,...,n˜}. The Monte Carlo approximation to L0(θ) is

 (17)

,

which is used to update θ.

To make Langevin sampling easier, we use mean images of training images as the sampling starting point. That is, we down-sampled each training image to a 1×1 patch, and up-sample this patch to the size of training image. We use cold start for Langevin sampling, i.e., at each iteration, we start sampling from mean images.

Algorithm 2 describes the details of the learning and sampling algorithm.

 Algorithm 2 Descriptor: real sampling Input:

(1)  training examples {Yi,i = 1,...,n},

(2)  number of Langevin steps l, (3) number of learning iterations T.

Output:

(1)  estimated parameters θ,

(2)  synthesized examples {Y˜i,i = 1,...,n}.

1: Let t ← 0, initialize θ.

2: repeat

3:                    For i = 1,...,n, initialize Y˜i to be the mean image of Yi.

4:               Run l steps of Langevin dynamics to evolve Y˜i, each step following equation (16).

5: Update θt+1 = θt +γtL0(θt), with step size γt, where L0(θt) is computed according to equation (17).

6:               Let t ← t + 1.

7: until t = T

 

2.1        TO DO
For the egret category, learn a descriptor model. Fill the blank part of ./DesNet/DesNet.py. Show:

(1)  Synthesized images.

More products