Starting from:

$20

CPTS540-Homework 11 Solved

1.     Consider the 3x3 Wumpus world shown below. The goal of this simplified game is to be collocated with the gold (where we get a +1000 reward) and not collocated with the Wumpus or a pit (or we get a -1000 reward). All other states have a reward of -1. As before, the agent starts in [1,1], but has only four possible actions: Up, Down, Left, Right (there is no orientation or turning). Each of these actions always works, although attempting to move into a wall results in the agent not moving. We will use reinforcement learning to solve this problem.

 

   

a.      Compute the utility U(s) of each non-terminal state s given the policy shown above. Note that [1,3], [2,3], [3,3] and [3,1] are terminal states, where U([1,3]) = -1000, U([2,3]) = +1000, U([3,3]) = -1000, and U([3,1]) = -1000. You may assume g = 0.9.

b.     Using temporal difference Q-learning, compute the Q values for Q([1,1],Right), Q([2,1],Up), Q([2,2],Up), after each of ten executions of the action sequence: Right, Up, Up (starting from [1,1] for each sequence). You may assume a = 0.9, g = 0.9, and all Q values for non-terminal states are initially zero.

                                                                                                                                         1

2.     Given the following bigram model, compute the probability of the two sentences below. Show your work.

Word 1 
Word 2 
Frequency 
the
player
2,000
player
is
1,000
is
next
3,000
next
to
4,000
to
the
6,000
to
a
5,000
the
gold
2,000
a
pit
1,000
 

a.      “the player is next to the gold”

b.     “the player is next to a pit”

 

3.     Suppose we add the following two words to the lexicon on slide 23 of the lecture notes:

Noun à player

Adverb à next

Using this augmented lexicon and the grammar on slide 24 of the lecture notes, show all possible parse trees for each of the following sentences. If there is no parse, then show a new grammar rule consisting of only non-terminals that will allow the sentence to be parsed, and show the parse tree.

a.      “the player is next to the gold”

b.     “the player is next to the gold in 2 3”

c.      “the player is next to the gold and a pit”.

 Repeat problem 2a, but using the bigram model available from www.ngrams.info. Specifically, download the zip file available from https://www.ngrams.info/iweb/iweb_ngrams_sample.zip. This zip file contains a bigram model in the file “ngrams_words_2.txt”. Use this file to compute the probability for the sentence in problem 2a. Show your work. Note:

•        Ignore case in the bigram model, i.e., treat all letters as lowercase. For example, when computing the frequency of “is”, sum the frequencies of all bigrams that begin with “is”, “Is” and “IS”.

•        If a bigram appears more than once in the model, then use the sum of all the frequencies for calculating probabilities. For example, “next to” appears eight times in the bigram model (see below). So, you would use the sum 612,358 as the frequency of “next to”.

                  568013  next    to      II21                    II22 

                  27837   Next    to      II21                    II22 

                  6403    next    to      II21_MD                 II22_II 

                  4043    next    to      MD                      II 

                  2068    next    to      MD                      TO_II 

                  1508    next    to      MD_II21                 II_II22 

                  1419    next    to      MD                      TO 

                  1067    next    to      II21_MD                 II22_TO 

More products