1 Generator: real inference
The model has the following form:
N(0,σ2ID), d < D. (2)
f(Z;W) maps latent factors into image Y , where W collects all the connection weights and bias terms of the ConvNet.
Adopting the language of the EM algorithm, the complete data model is given by
logp(Y,Z;W) = log[p(Z)p(Y |Z,W)] (3)
+ const. (4)
The observed-data model is obtained by intergrating out Z: p(Y ;W) = R p(Z)p(Y |Z,W)dZ. The posterior distribution of Z is given by p(Z|Y,W) = p(Y,Z;W)/p(Y ;W) ∝ p(Z)p(Y |Z,W) as a function of Z.
We want to minimize the observed-data log-likelihood, which is
. The gradient of L(W) can be calculated according to the fol-
lowing well-known fact that underlies the EM algorithm:
;W)dZ (5)
. (6)
The expectation with respect to p(Z|Y,W) can be approximated by drawing samples from p(Z|Y,W) and then compute the Monte Carlo average.
The Langevin dynamics for sampling Z ∼ p(Z|Y,W) iterates
, (7)
where τ denotes the time step for the Langevin sampling, δ is the step size, and Uτ denotes a random vector that follows N(0,Id).
The stochastic gradient algorithm can be used for learning, where in each iteration, for each Zi, only a single copy of Zi is sampled from p(Zi|Yi,W) by running a finite number of steps of Langevin dynamics starting from the current value of Zi, i.e., the warm start. With {Zi} sampled in this manner, we can update the parameter W based on the gradient L0(W), whose Monte Carlo approximation is:
) (8)
. (10)
Algorithm 1 describes the details of the learning and sampling algorithm.
Algorithm 1 Generator: real inference Input:
(1) training examples {Yi,i = 1,...,n},
(2) number of Langevin steps l, (3) number of learning iterations T.
(1) learned parameters W,
(2) inferred latent factors {Zi,i = 1,...,n}.
1: Let t ← 0, initialize W.
2: Initialize Zi, for i = 1,...,n.
3: repeat
4: Inference step: For each i, run l steps of of Langevin dynamics to sample Zi ∼ p(Zi|Yi,W) with warm start, i.e., starting from the current Zi, each step follows equation 7.
5: Learning step: Update W ← W +γtL0(W), where L0(W) is computed according to equation 10, with learning rate γt.
6: Let t ← t + 1.
7: until t = T
1.1 TO DO
For the lion-tiger category, learn a model with 2-dim latent factor vector. Fill the blank part of ./GenNet/GenNet.py. Show:
(1) Reconstructed images of training images, using the inferred z from training images.
(2) Randomly generated images, using randomly sampled z.
(3) Generated images with linearly interpolated latent factors from (−2,2) to (−2,2). For example, you inperlolate 8 points from (−2,2) for each dimension of z. Then you will get a 8 × 8 panel of images. You should be able to seee that tigers slight change to lion.
(4) Plot of loss over iteration.
2 Descriptor: real sampling
The descriptor model is as follows:
, (11)
where p0(Y ) is the reference distribution such as Gaussian white noise
The scoring function fθ(Y ) is defined by a bottom-up ConvNet whose parameters are denoted by θ. The normalizing constant Z(θ) = R exp[fθ(Y )]p0(Y )dY is analytically intractable. The energy function is
. (13)
pθ(Y ) is an exponential tilting of p0.
Suppose we observe training examples {Yi,i = 1,...,n} from an unknown data distribution Pdata(Y ). The maximum likelihood learning seeks to maximize the log-likelihood function
. (14)
If the sample size n is large, the maximum likelihood estimator minimizes the KullbackLeibler divergence KL(Pdatakpθ) from the data distribution Pdata to the model distribution pθ. The gradient of L(θ) is
, (15)
where Eθ denotes the expectation with respect to pθ(Y ). The key to the above identity is that ∂θ∂ logZ(θ) = Eθ[∂θ∂ fθ(Y )].
The expectation in equation (15) is analytically intractable and has to be approximated by MCMC, such as Langevin dynamics, which iterates the following step:
, (16)
where τ indexes the time steps of the Langevin dynamics, δ is the step size, and Uτ ∼ N(0,I) is Gaussian white noise. The Langevin dynamics relaxes Yτ to a low energy region, while the noise term provides randomness and variability. A Metropolis-Hastings step may be added to correct for the finite step size δ. We can also use Hamiltonian Monte Carlo for sampling the generative ConvNet.
We can run ˜n parallel chains of Langevin dynamics according to (16) to obtain the synthesized examples {Y˜i,i = 1,...,n˜}. The Monte Carlo approximation to L0(θ) is
which is used to update θ.
To make Langevin sampling easier, we use mean images of training images as the sampling starting point. That is, we down-sampled each training image to a 1×1 patch, and up-sample this patch to the size of training image. We use cold start for Langevin sampling, i.e., at each iteration, we start sampling from mean images.
Algorithm 2 describes the details of the learning and sampling algorithm.
Algorithm 2 Descriptor: real sampling Input:
(1) training examples {Yi,i = 1,...,n},
(2) number of Langevin steps l, (3) number of learning iterations T.
(1) estimated parameters θ,
(2) synthesized examples {Y˜i,i = 1,...,n}.
1: Let t ← 0, initialize θ.
2: repeat
3: For i = 1,...,n, initialize Y˜i to be the mean image of Yi.
4: Run l steps of Langevin dynamics to evolve Y˜i, each step following equation (16).
5: Update θt+1 = θt +γtL0(θt), with step size γt, where L0(θt) is computed according to equation (17).
6: Let t ← t + 1.
7: until t = T
2.1 TO DO
For the egret category, learn a descriptor model. Fill the blank part of ./DesNet/DesNet.py. Show:
(1) Synthesized images.