Starting from:

$25

SNLP - Exercise 4 -Solved

 Statistical Natural Language

Processing


Sheet IV


Information Theory
1) Information content (1 point)
Assume that the probability of being male is p(M) = 0.5 and so likewise for being female p(F) = 0.5. Suppose that 20% of males are T (i.e. tall) and that 6% of females are tall.

Calculate the probability that if somebody is “tall” (meaning taller than 6 ft or whatever), that person must be male.

Now, if you know that somebody is male, how much information do you gain (in bits) by learning that he is also tall? How much do you gain by learning that a female is tall? Finally, how much information do you gain from learning that a tall person is female?

2) Entropy (3 points)

Consider a binary symmetric communication channel (see figure 1), whose input source is the alphabet X = 0,1 with probabilities 0.5,0.5; whose output alphabet is Y = 0,1 and the channel is:

 

Figure 1: Binary symmetric communication channel with inputs X on the left side, outputs Y on the right side and p is the probability of transmission error.

•    (0.5 points) What is the entropy of the source, H(X)?

•    (1 point) What is the probability distribution of the outputs, p(Y ), and the entropy of this output distribution, H(Y )?

•    (1 point) What is the joint probability distribution for the source and the output, p(X,Y ), and what is the joint entropy, H(X,Y )?

•    (0.5 points) What is the mutual information of this channel, I(X;Y )? See: https://en.wikipedia.org/wiki/Mutual information

3) Kullback-Leibler Divergence (2 points)
First, provide proof that the Kullback-Leibler Divergence does not follow the triangle inequality and thus cannot be considered a true distance metric (1 point).

Now, let the random variable X have three possible outcomes a,b,c. Consider two distributions on this random variable:

Symbol
p(x)
q(x)
a b c
   
 
Calculate H(p), H(q), D(p||q) and D(q||p). Verify that in this case D(p||q) 6= D(q||p) (1 point).

4) Codes (2 points)

Consider the following source alphabet and its letter probabilities:

                                                                 A      B      C      D
 

•    (0.5 points) What is the entropy H, in bits, of the above source alphabet?

•    (1 point) Why are fixed length codes inefficient for alphabets whose letters are not equiprobable? Discuss this in relation to Morse Code.

•    (0.5 points) Offer an example of a uniquely decodable prefix code for the above alphabet. What features make it a uniquely decodable prefix code?

5) Martian codes and Kraft’s inequality (2 points)
Martians have landed on earth and you have found one of their code books. In it you find a code entry of the form:

 

The Si’s are encoded into strings from a D-symbol output alphabet in a uniquely decodable manner. If m = 6 and the code word lengths are (l1,l2,...,l6) = (1,1,2,3,2,3), what is the most likely base for the Martian code (i.e. what is a good lower bound for D)?


More products