$30
The writing of the project report, either in LaTeX or R Markdown, should detail all the theoretical aspects of the methods and provide all the necessary graphical displays necessary to the understanding of the particular cases.
Random variable generation
In the resolution of the following problems do not use any of the R random variable generation built-in functions, nor any R function referring to the densities of random variables.
1. Let X ∼ TruncatedPareto(α,L,H), which has probability density function
.
Fix your R random seed to 123 in the simulations.
(a) Analytically derive the cumulative distribution function (c.d.f.) F of X as well as its inverse F−1.
(b) Describe in detail (theory + algorithm) the inverse-transform (IT) method for generating samples from continuous distributions (as is our case). Implement this method in R. Call your routine sim.IT() and let it receive as input a generic sample size m as well as generic α,L and H parameters.
(c) Use routine sim.IT() to generate a sample of size m = 10000 of
X ∼ TruncatedPareto(0.5,2,4).
Report the first 10 simulated values. Explicitly derive and simplify the expression of the p.d.f.. Plot the sample histogram with the true p.d.f. superimposed.
(d) Describe in detail (theory + algorithm) the acceptance-rejection (AR) method for generating samples from continuous distributions (as is our case). Identify the candidate density function for the particular case of the X ∼ TruncatedPareto(0.5,2,4) and compute by hand the constants of the AR method (namely, M and the acceptance prob-
ability α). Use the R function optimize() or other to confirm the result you obtained for M.
(e) Implement the AR method in R. Call your routine sim.AR() and let it receive as input a generic sample size m as well as generic α,L and H parameters. Besides returning the simulated values of the target distribution, the sim.AR() routine should also return the simulated values that were rejected.
(f) Use routine sim.AR() to generate a sample of size m = 10000 of
X ∼ TruncatedPareto(0.5,2,4).
Report the first 10 simulated values of the TruncatedPareto(0.5,2,4) and the rejection rate. Plot the sample histogram with the true p.d.f. superimposed.
Graphically display, the candidate and p.d.f. functions against the hits and misses of the AR method (as done in class – week 2) when used to simulate just m = 15 sample values.
(g) Compare the computational times of routines sim.IT() and sim.AR() for generating a sample of size m = 50000 from the truncated Pareto distribution (you can use the R routine proc.time() as done in class – week 2).
2. Some random variables can be generated from the exponential distribution (exponentialbased method), which we know how to obtain from the U(0,1) distribution. Such is the case of the random variable X ∼ Gamma(α,θ):
α
If Yi ∼ Exp(1) then X = θ XYi ∼ Gamma(α,θ), α = 1,2,...
iid i=1
Fix your R random seed to 654 in the simulations.
a) Implement the exponential-based method in R for generating a sample from X ∼ Gamma(α,θ), which has probability density function (p.d.f.)
,
where Γ is the gamma function.
Call your routine sim.gam() and let it receive as input a generic sample size m and the Gamma distribution parameters α (note that here α ∈N) and θ. Provide both algorithm and R code.
b) Use routine sim.gam() to generate a sample of size m = 10000 from X ∼ Gamma(2,1).
Plot the histogram with the pdf superimposed.
Monte Carlo methods: integration
In the resolution of the problems in this section you may already use the R random variable generation built-in functions as well as any R function referring to the densities of random variables.
Fix your R random seed to 987 in the simulations. Let
a) Use the R function integrate() to compute the value of I.
b) Describe and implement in R the Monte Carlo method for estimating I (use size m = 10000). Report an estimate of the variance of the Monte Carlo estimator IˆMC of I.
c) Describe and implement in R the Monte Carlo methods of based on antithetic variables, control variables and importance sampling for estimating I (size m = 10000). Report an estimate of the variance of all the Monte Carlo estimators Iˆant, Iˆcont and Iˆis of I.
d) What’s the percentage of variance reduction that is achieved when using those MC estimators instead of IˆMC?
Monte Carlo methods: confidence intervals
In the resolution of the problems in this section you may already use the R random variable generation built-in functions as well as any R function referring to the densities of random variables.
Fix your R random seed to 569 in all simulations.
4. Assume X ∼ N(µ,σ2) with µ unknown and σ2 known. Let X1,...,Xn be a random sample of population X. One has that the random interval given by
,
is a (1 − α) × 100% confidence interval (CI) for µ regardless of sample size n.
In this exercise, one wishes to assess how a certain type of data contamination affects the coverage probability of a 95% CI for µ, given a random sample of a population X ∼ N(µ,1) of size n = 20, via a Monte Carlo simulation study (with m = 10000 simulations). For that purpose, consider that 90% of those n observations are drawn from a X ∼ N(0,1) distribution and that the remaining 10% are drawn from the contaminated normal distribution X ∼ N(k,1) with k = 1,5,9. The bad data points generated in this way are called shift outliers because the bad data points are shifted from µ = 0 in k standard deviations.
Present a thorough discussion of your results. How harmful can this type of contamination be in terms of the level of confidence of a CI? Would your results still hold for other levels of confidence other than 95%?