Starting from:

$30

CS760-Assignment 7 Solved

1        Directed Graphical Model
Consider the directed graphical model (aka Bayesian network) in Figure ??.



Figure 1: A Bayesian Network example.

Compute P(B = t | E = f,J = t,M = t) and P(B = t | E = t,J = t,M = t). These are the conditional probabilities of a burglar in your house (yikes!) when both of your neighbors John and Mary call you and say they hear an alarm in your house, but without or with an earthquake also going on in that area (what a busy day), respectively.

2        Chow-Liu Algorithm
Suppose we wish to construct a directed graphical model for 3 features X, Y , and Z using the Chow-Liu algorithm. We are given data from 100 independent experiments where each feature is binary and takes value T or F. Below is a table summarizing the observations of the experiment:

Homework 8                                                                                                             CS 760 Machine Learning



                                                                             X      Y      Z     Count







1.   Compute the mutual information I(X,Y ) based on the frequencies observed in the data.

2.   Compute the mutual information I(X,Z) based on the frequencies observed in the data.

3.   Compute the mutual information I(Z,Y ) based on the frequencies observed in the data.

4.   Which undirected edges will be selected by the Chow-Liu algorithm as the maximum spanning tree?

5.   Root your tree at node X, assign directions to the selected edges.

3        Kernel SVM
Consider the following kernel function defined over z,z0 ∈ Z:

0 (1   if z = z0, k(z,z ) =

                                                                                             0     otherwise.

1.   Prove that for any integer m 0, any z1,...,zm ∈ Z, the m × m kernel matrix K = [Kij] is positive semi-definite, where Kij = k(zi,zj) for i,j = 1...m. (Let us assume that for i 6= j, we have zi 6= zj.) Hint: An m × m matrix K is positive semi-definite if ∀v ∈ Rm,vKv ≥ 0.

2.   Given a training set (z1,y1),...,(zn,yn) with binary labels, the dual SVM problem with the above kernel k will have parameters α1,...,αn,b ∈ R. (Assume that for i 6= j, we have zi 6= zj.) The predictor for input z takes the form



Recall the label prediction is sgn(f(z)). Prove that there exists α1,...,αn,b such that f correctly separates the training set. In other words, k induces a feature space rich enough such that in it any training set can be linearly separated.

3.   How does that f predict input z that is not in the training set?

Comment: One useful property of kernel functions is that the input space Z does not need to be a vector space; in other words, z does not need to be a feature vector. For all we know, Z can be turkeys in the world. As long as we can compute k(z,z0), kernel SVM works on turkeys.

4        Extra Credit: Kernel functions over discrete space
Kernel functions can be defined over objects as diverse as graphs, sets, strings, and text documents. Consider, for instance, a fixed set D and define a nonvectorial space consisting of all possible subsets of this set D. If A1 and A2 are two such subsets then one simple choice of kernel would be

k(A1,A2) = 2|A1∩A2|

where A1 ∩A2 denotes the intersection of sets A1 and A2, and |A| denotes the size of A. Show that this is a valid kernel function, by showing that it corresponds to an inner product in a feature space. Solution goes here.

Homework 8                                                                                                             CS 760 Machine Learning



5        Extra Credit: Support Vector Machines
Given data {(xi,yi),1 ≤ i ≤ n}, the (hard margin) SVM objective is



s.t. yi(wxi + b) ≥ 1(∀i).

The dual is



s.t. αi ≥ 0(∀i), Xαiyi = 0.

i=1

Suppose the optimal solution for the dual is , and the optimal solution for the primal is

(w∗,b∗). Show that the margin



satisfies

.

More products