$35
Algae status is monitored in 274 sites at Finnish lakes and rivers. The observations for the 2008 algae status at each site are presented in the algae dataset (’0’: no algae, ’1’: algae present). The data can be accessed from the bsda R package as follows:
library(bsda) data("algae") head(algae)
## [1] 0 1 1 0 0 0
# the data is now stored in the variable ’algae’
So that you can test the correctness of your code implementations, we provide some results for the following test data. It is also possible to check the functions you need to implement with markmyassignment.
algae_test <- c(0, 1, 1, 0, 0, 0)
Note! This data is only for the tests, you need to change to the full data algae when reporting your results.
Let π be the probability of a monitoring site having detectable blue-green algae levels and y the observations in algae. Use a binomial model for the observations y and a Beta(2,10) prior for binomial model parameter π to formulate a Bayesian model. Here it is not necessary to derive the posterior distribution for π as it has already been done in the book and it su ces to refer to that derivation. Also, it is not necessary to write out the distributions; it is su cient to use label-parameter format, e.g. Beta(·,·).
Your task is to make Bayesian inference for binomial model and answer questions based on it:
a) formulate (1) the likelihood p(y|π) as a function of π, (2) the prior p(π), and (3) the resulting posterior p(π|y). Report the posterior in the format Beta(·,·), where you replace ·’s with the correct numerical values.
b) What can you say about the value of the unknown π according to the observations and your prior knowledge? Summarize your results with a point estimate (i.e. E(π|y)) and a 90% posterior interval. Note! Posterior intervals are also called credible intervals and are di erent from con dence intervals. Note! In your report, use the values from the data algae, not algae_test.
beta_point_est(prior_alpha = 2, prior_beta = 10, data = algae_test)
## [1] 0.2222222
beta_interval(prior_alpha = 2, prior_beta = 10, data = algae_test, prob = 0.9)
## [1] 0.0846451 0.3956414
c) What is the probability that the proportion of monitoring sites with detectable algae levels π is smaller than π0 = 0.2 that is known from historical records?
beta_low(prior_alpha = 2, prior_beta = 10, data = algae_test, pi_0 = 0.2)
## [1] 0.4511238
d) What assumptions are required in order to use this kind of a model with this type of data? (No need to discuss exchangeability yet, as it is discussed in more detail in BDA Chapter 5 and Lecture 7)
e) Make prior sensitivity analysis by testing a couple of di erent reasonable priors and plot the di erent posteriors. Summarize the results by one or two sentences.
Hint! With a conjugate prior, a closed-form posterior is Beta form (see equations in the book). Useful functions: dbeta, pbeta, qbeta in R.