Starting from:

$30

Statistical Models-Project 1 Solved

Statistical Models

Consider the following graphical structures, corresponding to (different) Bayesian networks. For which network does the statement A ⊥ B | C hold? For which does the statement A ⊥ B hold? Prove your answers by the laws of probability.

                             a)                          b)        

Problem 2: Markov blanket

Consider the following graphical structure of a Bayesian network:

 

Determine the Markov blanket MB(D) of node D and show that the conditional probability P(D | A,B,C,E,F,G) is the same as P(D | MB(D)).

Problem 3: Learning Bayesian networks from protein data                                                                   

In this exercise, we will use the R package BiDAG[1] to learn Bayesian networks from protein data. The data provided in sachs.data.txt consists of the measurements of 11 phosphorylated proteins and phospholipids derived from primary immune system cells, subjected to both general and specific molecular interventions [2]. (Hint: read the help files of the package and use default parameters unless otherwise stated.)

(a)     First, run set.seed(2022) for reproducibility. Read in the data from sachs.data.txt. Report the number of variables n and the number of observations N. Randomly split the data into 80% training data and 20% test data. Initialize the parameters using the function scoreparameters with the training data and the Bayesian Gaussian equivalent (BGe) score

[3, 4].    [Note: The BGe score is a fully-decomposable marginal likelihood function P(D | G) for scoring Bayesian networks. The main underlying assumption is that the data is normally distributed with N(µ,W−1). The precision matrix W follows a Wishart prior Wn(T−1,αw), where αw > n − 1 is the degrees of freedom and T is the positive definite parametric matrix. The mean vector µ follows a normal prior N(ν,αµW) with αµ > 0.]

(b)     Learn a Bayesian network using the order MCMC algorithm. Plot the directed acyclic graph (DAG). Evaluate the log BGe score of the test data against the estimated DAG. (Hint: one can use the R package graph for the plot.) (1 point)

(c)      One of the arguments in the scoreparameters function is bgepar = list(am = 1, aw = NULL), which corresponds to the hyper-parameters αµ and αw for the BGe score. By default, αµ = 1 and αw = n + αµ+1.

Now, consider the set of values {10−5,10−3,10−1,10,102} for am and keep aw = NULL fixed. For each value, repeat the process of splitting the data, initializing the parameters, and learning the DAG for 10 times. Then, report the average number of edges in the DAGs and the average log BGe score of the test data in a table as the one shown below. Remember to run set.seed(2022) for reproducibility. (Hint: running the code parallelly with the package parallel can help reduce the runtime.)

Parameter am
10−5
10−3
10−1
10
102
Average number of edges
 
 
 
 
 
Average BGe score of the test data
 
 
 
 
 
What do you observe? Choose the value of am corresponding to the highest test BGe score and plot the DAG re-learned from the whole dataset. 

More products