$35
1. (Basic probability theory notation and terms). This can be trivial or you may need to refresh your memory on these concepts. Note that some terms may be di erent names for the same concept. Explain each of the following terms with one sentence:
probability probability mass probability density probability mass function (pmf) probability density function (pdf) probability distribution discrete probability distribution continuous probability distribution cumulative distribution function (cdf) likelihood aleatoric uncertainty epistemic uncertainty
2. (Basic computer skills) This task deals with elementary plotting and computing skills needed during the rest of the course. You can use either R or Python, although R is the recommended language and we will only guarantee support in R. For documentation in R, just type ?{function name here}.
a) Plot the density function of Beta-distribution, with mean µ = 0.2 and variance σ2 = 0.01. The parameters α and β of the Beta-distribution are related to the mean and variance according to the following equations
.
Hint! Useful R functions: seq(), plot() and dbeta(). Later on we will also use the more exible ggplot2 for plotting.
b) Take a sample of 1000 random numbers from the above distribution and plot a histogram of the results. Compare visually to the density function.
Hint! Useful R functions: rbeta() and hist()
c) Compute the sample mean and variance from the drawn sample. Verify that they match (roughly) to the true mean and variance of the distribution.
Hint! Useful R functions: mean() and var()
d) Estimate the central 95% probability interval of the distribution from the drawn samples.
Hint! Useful R functions: quantile()
3. (Bayes’ theorem) A group of researchers has designed a new inexpensive and painless test for detecting lung cancer. The test is intended to be an initial screening test for the population in general. A positive result (presence of lung cancer) from the test would be followed up immediately with medication, surgery or more extensive and expensive test. The researchers know from their studies the following facts:
Test gives a positive result in 98% of the time when the test subject has lung cancer.
Test gives a negative result in 96 % of the time when the test subject does not have lung cancer.
In general population approximately one person in 1000 has lung cancer.
The researchers are happy with these preliminary results (about 97% success rate), and wish to get the test to market as soon as possible. How would you advise them? Base your answer on Bayes’ rule computations.
Hint : Relatively high false negative (cancer doesn’t get detected) or high false positive (un-necessarily administer medication) rates are typically bad and undesirable in tests.
Hint : Here are some probability values that can help you gure out if you copied the right conditional probabilities from the question.
P(Test gives positive | Subject does not have lung cancer) = 4%
P(Test gives positive and Subject has lung cancer) = 0.098% this is also referred to as the joint probability of test being positive and the subject having lung cancer.
4. (Bayes’ theorem) We have three boxes, A, B, and C. There are
2 red balls and 5 white balls in the box A,
4 red balls and 1 white ball in the box B, and 1 red ball and 3 white balls in the box C.
Consider a random experiment in which one of the boxes is randomly selected and from that box, one ball is randomly picked up. After observing the color of the ball it is replaced in the box it came from. Suppose also that on average box A is selected 40% of the time and box B 10% of the time (i.e. P(A) = 0.4).
a) What is the probability of picking a red ball?
b) If a red ball was picked, from which box it most probably came from?
Implement two functions in R that computes the probabilities. Below is an example of how the functions should be named and work if you want to check them with markmyassignment.
boxes <- matrix(c(2,2,1,5,5,1), ncol = 2, dimnames = list(c("A", "B", "C"), c("red", "white"))) boxes
##
red white
## A
2 5
## B
2 5
## C
1 1
p_red(boxes = boxes) ## [1] 0.3928571
p_box(boxes = boxes)
## [1] 0.29090909 0.07272727 0.63636364
Note! This is a test case, you will need to change the numbers in the matrix to the numbers in the exercise.
5. (Bayes’ theorem) Assume that on average fraternal twins (two fertilized eggs and then could be of di erent sex) occur once in 150 births and identical twins (single egg divides into two separate embryos, so both have the same sex) once in 400 births (Note! This is not the true values, see Exercise 1.6, page 28, in BDA3). American male singer-actor Elvis Presley (1935 1977) had a twin brother who died in birth. Assume that an equal number of boys and girls are born on average. What is the probability that Elvis was an identical twin? Show the steps how you derived the equations to compute that probability.
Implement this as a function in R that computes the probability.
Below is an example of how the functions should be named and work if you want to check your result with markmyassignment.
p_identical_twin(fraternal_prob = 1/125, identical_prob = 1/300)
## [1] 0.4545455
p_identical_twin(fraternal_prob = 1/100, identical_prob = 1/500)
## [1] 0.2857143