$20
1. Consider the 3x3 Wumpus world shown below. The goal of this simplified game is to be collocated with the gold (where we get a +1000 reward) and not collocated with the Wumpus or a pit (or we get a -1000 reward). All other states have a reward of -1. As before, the agent starts in [1,1], but has only four possible actions: Up, Down, Left, Right (there is no orientation or turning). Each of these actions always works, although attempting to move into a wall results in the agent not moving. We will use reinforcement learning to solve this problem.
a. Compute the utility U(s) of each non-terminal state s given the policy shown above. Note that [1,3], [2,3], [3,3] and [3,1] are terminal states, where U([1,3]) = -1000, U([2,3]) = +1000, U([3,3]) = -1000, and U([3,1]) = -1000. You may assume g = 0.9.
b. Using temporal difference Q-learning, compute the Q values for Q([1,1],Right), Q([2,1],Up), Q([2,2],Up), after each of ten executions of the action sequence: Right, Up, Up (starting from [1,1] for each sequence). You may assume a = 0.9, g = 0.9, and all Q values for non-terminal states are initially zero.
1
2. Given the following bigram model, compute the probability of the two sentences below. Show your work.
Word 1
Word 2
Frequency
the
player
2,000
player
is
1,000
is
next
3,000
next
to
4,000
to
the
6,000
to
a
5,000
the
gold
2,000
a
pit
1,000
a. “the player is next to the gold”
b. “the player is next to a pit”
3. Suppose we add the following two words to the lexicon on slide 23 of the lecture notes:
Noun à player
Adverb à next
Using this augmented lexicon and the grammar on slide 24 of the lecture notes, show all possible parse trees for each of the following sentences. If there is no parse, then show a new grammar rule consisting of only non-terminals that will allow the sentence to be parsed, and show the parse tree.
a. “the player is next to the gold”
b. “the player is next to the gold in 2 3”
c. “the player is next to the gold and a pit”.
Repeat problem 2a, but using the bigram model available from www.ngrams.info. Specifically, download the zip file available from https://www.ngrams.info/iweb/iweb_ngrams_sample.zip. This zip file contains a bigram model in the file “ngrams_words_2.txt”. Use this file to compute the probability for the sentence in problem 2a. Show your work. Note:
• Ignore case in the bigram model, i.e., treat all letters as lowercase. For example, when computing the frequency of “is”, sum the frequencies of all bigrams that begin with “is”, “Is” and “IS”.
• If a bigram appears more than once in the model, then use the sum of all the frequencies for calculating probabilities. For example, “next to” appears eight times in the bigram model (see below). So, you would use the sum 612,358 as the frequency of “next to”.
568013 next to II21 II22
27837 Next to II21 II22
6403 next to II21_MD II22_II
4043 next to MD II
2068 next to MD TO_II
1508 next to MD_II21 II_II22
1419 next to MD TO
1067 next to II21_MD II22_TO