Consider p = 1. Simulate 5,000 Monte Carlo samples from the conditional prior β|τ2 = 1 and obtain a plot of the density using the R function density.
n <- 5000
plot(density(rnorm(n,0,1)), main=TeX(paste("$\\beta$", "marginal")))
β marginal
−4
−2
0
2
4
0.0
0.1
0.2
0.3
0.4
Density
N = 5000 Bandwidth = 0.1624
Part b.
Consider p = 1. Simulate 5,000 Monte Carlo samples from the marginal prior β, considering λ2 = 2, so that E(τ2|λ) = 1. Obtain a plot of the density as in a.
lambda <- sqrt(2) tau.sq <- rgamma(n,shape=1,rate = lambda^2/2) beta.marginal <- rnorm(n,0,sqrt(tau.sq))
plot(density(beta.marginal), main=TeX(paste("$\\lambda^2 = 2$")), xlim=c(-5,5))
−4
−2
0
2
4
0.0
0.2
0.4
0.6
λ
2
=
2
Density
N = 5000 Bandwidth = 0.1183
Part c.
Consider p = 1. Add a hyper prior on γ = 1/γ ∼ Gamma(a,rate = b). Assess how the marginal prior of β changes for a = 1 and values of b ≥ 1.
set.seed(1) par(mfrow=c(2,2)) rates <- c(1,3,5,10) for(b in rates){
lambda <- 1/rgamma(n,1,b) tau.sq <- rgamma(n,shape=1,rate = lambda^2/2) beta.marginal <- rnorm(n,0,sqrt(tau.sq)) plot(density(beta.marginal), main=paste("rate b = ",b),xlim=c(-5,5))
}
0.0
0.4
0.8
Density
0.0
1.5
Density
rate b = 1
−4 −2 0 2 4
N = 5000 Bandwidth = 0.09665
rate b = 3
−4 −2 0 2 4
N = 5000 Bandwidth = 0.03314
0
2
4
Density
0
4
8
Density
rate b = 5
−4 −2 0 2 4
N = 5000 Bandwidth = 0.01929
rate b = 10
−4 −2 0 2 4
N = 5000 Bandwidth = 0.009661
Part d.
Considering the hyper prior in c., describe a Markov Chain Monte Carlo algorithm to sample from the posterior distribution of β and σ2.
I will implement a joint Gibbs and Metropolis sampler. The model is
y|β,σ2 ∼ N(Xβ,σ2I)
Inverse-Gamma(1, λ22) λ ∼ Inverse-Gamma(a,1/b) σ2 ∼ Inverse-Gamma(0.1,0.1).
I need the full conditionals
{β1,...,βp|y,σ2,τ12,...,τp2,λ},
{σ2|y,β1,...,βp,τ12,...,τp2,λ},
{τ12,...,τp2|y,β1,...,βp,σ2,λ},
{λ|y,β1,...,βp,σ2,τ12,...,τp2}
which are all proportional to
p(y|β1,...,βp,τ12,...,τp2,σ2,λ) × p(β1,...,βp|τ12,...,τp2) × p(τ12,...,τp2|λ)p(λ)p(σ2)
so I’ll start with the posterior
p(β1,...,βp,τ12,...,τ ,...,βp,τ12,...,τp2,σ2,λ)
× p(β1,...,βp|τ12,...,τp2)
× p ,...,τp2|λ)p(λ)p(σ2).
As a function of just σ2, this is proportional to
p(y|β1,...,βp,τ12,...,τp2,σ2,λ)p(σ2) = N(Xβ,σ2I)IG(0.1,0.1).
Time to show this is inverse-gamma distributed.
N(y;Xβ,σ2I)IG(σ2;q,r)
r
σ2
As a function of β, the conditional is proportional to
p(y|β1,...,βp,τ12,...,τp2,σ2,λ)p(β1,...,βp|τ12,...,τp2)
p
= N(Xβ,σ2I) · YN(0,τi2)
i=1 = N(Xβ,Σ) · N(0,Ω), where Ω = diag(τ12,...,τp2)
= N(m,M)
because the posterior is determined by the quadratic form
(y − Xβ)Σ−1(y − Xβ) + βΩ−1β = (β − m)M−1(β − m).
Completing the square gives m = MXΣ−1y and M = (XΣ−1X + Ω−1)−1.
As a function of τ12,...,τp2, the target is proportional to
p(β1,...,βp|τ12,...,τp2)p(τ12,...,τp2|λ)
p p λ2
= YN(βi;0,τi2) · YIG(τi2;1, 2 )
i=1 i=1
Finally, as a function of λ, the target distribution is proportional to
p(τ12,...,τp2|λ)p(λ)
p λ2
= YIG(τi2;1, 2 ) · IG(λ;a,b)
i=1
Now I can build an algorithm to iteratively update through these target distributions. I take the starting value of β(0) to be the least-squares solution βˆ along with σ2(0) = σˆ2, the MLE for σ2.
Result: Samples from joint posterior p(β,σ2|y) for s in # samples do
∗
λ
else
λ
end
end
σ
2(
s
+1)
β
(
s
+1)
note: extra term due in logr due to Jacobian of transformation λ ← exp(log(λ(s)) + ε), ε ∼ N(0,δ2) logr ← logπλ(λ∗) − logπλλ(s) + logλ∗ − logλ(s) if (logunif(0,1) < logr) then
(s+1) ← λ∗
(s+1) ← λ(s) for j in 1:p do
note: extra term due in logr due to Jacobian of transformation τj2∗ ← exp(log(τj2(s)) + ε), ε ∼ N(0,δ2) logr ← logπτj2(τj2∗) − logπτj2(τj2(s)) + log(τj2∗) − log(τj2(s)) if (logunif(0,1) < logr) then
else τj2(s+1) ← τj2(s)
end
∼ IG(n/2 + a,2b + (y − Xβ(s))(y − Xβ(s))/2)
∼ N(m,M),where
M = (XΣ−1X + Ω−1)−1 and m = M(XΣ−1y)
Σ = σ2(s+1) and Ω = diag ,...,τp2(s+1)) end
Part f.
Implement such algorithm in R and compare your results with estimates obtained using glmnet(). In particular, you should test your results on the diabetes data available from lars, (use the matrix of predictors x).
## Data processing
sourceCpp("helperFunctions.cpp") set.seed(1) data("diabetes")
X <- cbind(rep(1,length(diabetes$x)),cbind(diabetes$x)); y <- diabetes$y; n <- nrow(X); p <- ncol(X); samples <- 1000
## Initialize starting values
lambda <- 1 tau2 <- rep(1000,p) beta <- solve(t(X)%*%X)%*%t(X)%*%y sigma2 <- t(y-X%*%beta)%*%(y-X%*%beta)/n sigma2.chain <- lambda.chain <- rep(0,samples) beta.chain <- tau2.chain <- matrix(0,nrow=p,ncol=samples)
## MCMC
for(s in 2:samples){ lambda <- lambdaDraw(current=lambda,tau2=tau2,a=1,b=1) tau2 <- tau2Draw(current=tau2,beta=beta,lambda=lambda) sigma2 <- sigma2Draw(beta, y, X) mM <- betaMeanCov(y,X,sigma2,tau2) beta <- t(rmvnorm(n=1,mean=mM$mean,sigma=mM$cov)) lambda.chain[s] <- lambda sigma2.chain[s] <- sigma2 beta.chain[,s] <- beta tau2.chain[,s] <- tau2 }
# Examine markov chains
# plot(beta.chain[1,floor(samples/4):samples],type="l")
# Plot table of coefficients from Glmnet and Bayesian Lasso
comparison <- data.frame(
"Bayesian Lasso" = rowMeans(beta.chain[,floor(samples/4):samples]),
"Glmnet" = matrix(coef(glmnet(y=y,x=X),alpha=1,s=1)[-2])
) kable(comparison, "latex", booktabs = T)
Bayesian.Lasso
Glmnet
152.0653282
152.13348
0.0986925
0.00000
-146.5473735
-195.89915
515.9450575
522.05142
267.6553181
296.18834
-64.7222100
-101.86185
-36.1819103
0.00000
-170.9613979
-223.22347
59.2924739
0.00000
468.2759684
513.57366
50.7368620
53.86052
• I initially notice the difference in parameterization between glment’s lasso and my Bayesian lasso. I’m viewing the coefficients for λ = 1 in glmnet and a value of b = 1 for Bayesian lasso. As I’ll show later, in this implementation of Bayesian lasso, shrinkage is very sensitive to the hyperparameter b of λ.
Part g.
Free λ and carry out a sensitivity analysis assessing the behavior of the posterior distribution of β and σ2, as hyper parameters a and b are changed. Explain clearly the rationale you use to assess sensitivity and provide recommendations for the analysis of the diabetes data.
# Sequence of lambdas for comparison with glmnet lambdas <- seq(from=-12, to = -1, length.out = 12)
# Keep track of posterior mean of beta for each fixed lambda post.means <- matrix(NA,nrow=p,ncol=length(lambdas))
for(i in 1:length(lambdas)){ for(s in 2:samples){ lambda <- exp(lambdas[i]) tau2 <- tau2Draw(current=tau2,beta=beta,lambda=lambda) sigma2 <- sigma2Draw(beta, y, X) mM <- betaMeanCov(y,X,sigma2,tau2) beta <- t(rmvnorm(n=1,mean=mM$mean,sigma=mM$cov)) sigma2.chain[s] <- sigma2 beta.chain[,s] <- beta tau2.chain[,s] <- tau2
} post.means[,i] <- matrix(rowMeans(beta.chain[,floor(samples/4):samples]),nrow = p) }
−4
−2
0
2
4
−600
−200
200
600
Log Lambda
Coefficients
10
10
7
4
0
−12
−10
−8
−6
−4
−2
−500
0
500
log lambdas
Coefficients
• Glmnet is on the left and Bayesian lasso on the right. These regularization paths look very similar.
Part g.
Free λ and carry out a sensitivity analysis assessing the behavior of the posterior distribution of β and σ2, as hyper parameters a and b are changed. Explain clearly the rationale you use to assess sensitivity and provide recommendations for the analysis of the diabetes data.
## Sequence of b's that define a path of hyperparameters for lambda
bs <- c(seq(from=1e-5,to=20,length.out = 30)) betasB <- matrix(0,nrow=p,ncol=length(bs))
## Keep track of posterior means for each b value whith lambda free post.means <- matrix(NA,nrow=p,ncol=length(bs))
for(j in 1:length(bs)){ for(s in 2:samples){ lambda <- lambdaDraw(current=lambda,tau2=tau2,a=1,b=1/bs[j]) tau2 <- tau2Draw(current=tau2,beta=beta, lambda=lambda) sigma2 <- sigma2Draw(beta, y, X) mM <- betaMeanCov(y,X,sigma2,tau2) beta <- t(rmvnorm(n=1,mean=mM$mean,sigma=mM$cov)) beta.chain[,s] <- beta
} betasB[,j] <- rowMeans(beta.chain[,floor(samples/4):samples]) }
Effect of 1/b
0
5
10
15
20
−200
0
100
300
500
1
/b's
Coefficients
• Though I didn’t include the plots, the shrinkage of coefficients seemed very robust to changes in a for fixed b, so I chose to fix a = 1 and focus on varying b. From the plot, I notice as 1/b approaches zero, the coefficients approach the least-squares estimates. As 1β increases, there’s an increasing amount of shrinkage towards zero. This matches behavior of the regularization paths in part g as lambda is fixed and increasing.
Part e.
C++
Helper functions.
#include <cmath
#include <math.h
#include <random
#include <RcppArmadillo.h using namespace Rcpp; using namespace std;
// [[Rcpp::depends(RcppArmadillo)]]
double loglambdaTarget(double lambda, vector<double tau2, double a, double b) { return((-a-1)*log(lambda)-(pow(lambda,2)/2*accumulate(tau2.begin(),tau2.end(),0))-1
}
double logtau2jTarget(double tau2j, double lambda, double betaj){ return(-log(sqrt(tau2j)) - 1.0/2.0*(1.0/tau2j*pow(betaj,2) + pow(lambda,2)*tau2j)); }
// [[Rcpp::export]]
double lambdaDraw(double current, vector<double tau2, double a, double b){ std::random_device rd; std::mt19937 mt(rd()); std::uniform_real_distribution<double dist(0, 1.0); std::normal_distribution<double norm(0, current); double proposed = exp(log(current)+norm(mt)); double logr = loglambdaTarget(proposed,tau2,1,b)loglambdaTarget(current,tau2,1,b)+log(proposed)log(current);
if(log(dist(mt))<logr){ current = proposed;
} return current;
}
// [[Rcpp::export]]
vector<double tau2Draw(vector<double current, vector<double beta, double lambda){ std::random_device rd; std::mt19937 mt(rd()); std::uniform_real_distribution<double dist(0, 1.0); std::normal_distribution<double norm(0, 1); for(int j=0;j < current.size(); j++){ double tau2j_proposed = exp(log(current[j])+norm(mt)); double logr = logtau2jTarget(tau2j_proposed, lambda, beta[j]) logtau2jTarget(current[j], lambda, beta[j]) + log(tau2j_proposed)-log(current[j]);
if(log(dist(mt)) < logr){ current[j] = tau2j_proposed; } } return current;
/(b*lambda));
}
// [[Rcpp::export]]
double sigma2Draw(const arma::vec & beta, const arma::vec & y, const arma::mat & X){ std::random_device rd; std::mt19937 mt(rd()); int n = X.n_rows; arma::colvec coef = arma::solve(X, y); arma::colvec resid = y - X*coef; double rss = arma::as_scalar(arma::trans(resid)*resid); std::gamma_distribution<double gamma(n/2.0+0.1,1.0/(rss/2.0+0.1)); return 1.0/gamma(mt);
}
// [[Rcpp::export]]
List betaMeanCov(const arma::vec & y, const arma::mat & X, double sigma2, const std::random_device rd; std::mt19937 mt(rd()); int p = X.n_cols; arma::mat I = arma::eye(p,p); arma::mat tau2_inv = arma::inv(arma::diagmat(tau2)); arma::mat M = arma::inv(X.t()*X/sigma2+tau2_inv); arma::colvec m = M*X.t()*y/sigma2; return List::create(Named("cov") = M, Named("mean")= m);
}
// List mcmc(double lambda, double sigma2, const arma::vec & tau2, const arma::vec & beta2){
//
// return List::create(Named("beta.chain") = M, Named("sigma2.chain")= m,
// Named("tau2.chain") = M, Named("lambda.chain") = M); // }
arma::vec