$25
• For hands-on parts, your source code should be submitted as well (include it in the above zip file).
Theoretical questions:
1. Consider two distributions P(X,Y ) and Q(X,Y ) over random variables X and Y . Prove that the KL divergence between these two distributions can be factorized as
KL[P(X,Y )||Q(X,Y )] = KL[P(Y )||Q(Y )] + KL[P(X | Y )||Q(X | Y )]
2. Suppose you have discrete state, action, and observation spaces, X, A and Z, respectively. Consider the transition and observation models PT(X′ | X,a) and PZ(z | X) are given for any X′,X′ ∈ X, a ∈ A and z ∈ Z. Denote number of elements in some set S by |S| (also known as cardinality). For example, the state and observation at any time i can only assume a value from X = {x1,...xn} and Z = {z1,...,zm}, respectively, with |X| = n and |Z| = m (similarly for action space A).
In this question we consider a single look-ahead step (myopic) planning, and we are interested in understanding the computational complexity of calculating the objective function for a single candidate action. Suppose the
.
current belief is bk = b[Xk], consider some candidate action ak ∈ A and assume a general reward function r(b,a).
(a) Write the objective function J(bk,ak) for a single look-ahead step case. Write the expectation operator explicitly. Describe the steps to calculate J(bk,ak) for the given problem setting.
(b) Derive an expression for the probability of acquiring a future observation zk+1 ∈ Z while considering candidate action ak and current belief bk. Express your answer only in terms of quantities given in the question.
(c) What is the computational complexity of performing the calculations in clause 2b? Denote it by Oz, and express your answer in terms of cardinality of the action, observation and state spaces.
(d) Write explicit equations for calculating the posterior future belief[1] bk+1 =. b[Xk+1], given current belief is bk = b[Xk] while considering action ak ∈ A and future observation zk+1 ∈ Z.
(e) What is the computational complexity of performing the calculations in clause 2d? Denote it by Ob and express your answer in terms of cardinality of the action, observation and state spaces.
.
(f) Consider now the reward function is entropy, i.e. r(b[X],a) = H(b[X]). Write down the corresponding expression for a given belief b[X] and indicate the computational complexity to calculate it. Denote it by Or and express your answer in terms of cardinality of the action, observation and state spaces.
(g) What is the computational complexity of calculating exactly the objective function J(bk,ak) (from clause 2a)? Express your answer in terms Or, Ob, Oz and cardinality of the action, observation and state spaces.
[1] Assume recursive formulation.