Starting from:

$25

CS2420–  Program 4 Hashing Poetry Solved

Part 1: Becoming familiar with the code

HashTable code has been given to you.  No testing program has been provided.  To become familiar with how the code works, try reading in a small input file and make sure you can create a hash table of those entries.

Testing: Make sure the following works:

a. Insert values

b. Delete values

c. Find values

d. Printing the contents of the hash table.

e. Control the size of the hash table.

What happens if you attempt to delete an item that isn’t there?

What if you add more things than can fit into the hash table?

Modification: You have been given the hash table code from your text for doing quadratic probing.  Modify the code so it takes the key and the associated object as two separate parameters.

Please retain the generic structure of the hash table.  Be careful not to add anything to the hash table that only makes sense for this specific problem.

Part 1 is for your benefit.  The code you turn in does not need to show the results of this experimentation. 

 

Part 2 Using the code to solve a bigger problem

This assignment is designed to give you experience with hash tables.  In this assignment, you will randomly create a new verse of poetry (based on the pattern seen in the existing poem).

You will fit a bi-word language model to English and then use it to generate a poem.  A uni-word model of English consists of a single probability distribution P(W) over the set of all words.

A bi-word model of English consists of two probability distributions: P(W) and P(Wi+1 | Wi). The first distribution is just the probability a word in a document. The second distribution is the probability of seeing word Wi+1 given that the previous word was Wi.

Given a set of documents (in our case, various poems), your job in this assignment is to generate a poem from a given word.  The method is simple. You just need to decide which word should follow a given word.  The word selected is based on the probability in which it occurs (in the file) after the last word used.  So for example, in the text below, if I ask, “What should follow ‘sam’”?, you pick ‘I’ most of the time  (7 out of 8 times) and ‘sam’ the rest of the time.  No other word follows ‘sam’.

I am Sam. I am Sam. Sam I Am.

That Sam I Am! That Sam I Am! I do not like that Sam I Am!

Do you like green eggs and ham?

I do not like them, Sam I Am.

I do not like green eggs and ham.

Would you like them here or there?

I would not like them here or there.

I would not like them anywhere.

I do not like green eggs and ham.

I do not like them, Sam I Am.

Thus you need to keep track of the number of times that each word follows every other word  (so you can figure probabilities). 

Keep track of all words (and their probabilities) using a hash table in which you hash on word W.   For the input above, partial contents of the hash table are shown below.  Thus, “do” occurs 6 times in the file.  Five times “do” is followed by “not” and one time it is followed by “you”



Similarly, for the input file, when you see the word “sam”, seven times out of 8 “i” is the next word.  One time out of 8, “sam” is the next work.

Given the information in this data structure, we can compute the conditional probability that “i” occurs once you are looking at word “sam” P(i|sam) as 7/8. Similarly, we can compute the probability P(ham|and) as 3/3 = 1.  When I ask what should follow “sam”, you will pick the next word given the probabilities seen.  In other words,  87.5% of the time you will want ‘i’ to come next (after sam).

Given a specific input file,  you are to generate a poem of a given length from a specific starting word.  So for example, poem(“sam”, 20) is to generate a poem of 20 words that begins with the word sam.    Your poem might look like: “sam I am sam I am I am I would not like them here or there I am do not like” or “sam I would you like green eggs and ham I am do you like them sam I am I am I”

Specifics

For the data, remove all special characters and punctuation and change everything to lower case. 

Output

There are several data files associated with the test.    You will generate poems from the following specification:

File
Beginning word
Length of Poem
Print Hash Table?
green.txt
sam
20
Yes 
test.txt
that
20
Yes
clown.txt
go
20
Yes 
inch.txt
computer
50
no
Poe.txt
nevermore
50
no
Seuss.txt
mordecai
50
no
For grading purposes, you are asked to print the contents of the hash table for some of the test cases. This will be a huge benefit to you in debugging.   Because the next word is generated via a random number generator,  you should not expect your output to match that of others.

Hints

In order to generate which word follows word WORD with the correct probability, I used the following strategy.  I kept track of how many times WORD occurred (call it occurCt) and how many times each word followed WORD (followCt).   I generated a random number between 0 and occurCt.  Then I considered each word which followed WORD in order.  I added up the followCts of each following word until the sum of the followCts passed the random number.  When the sum passed the random number, that is the word I selected to be the next word in the poem.

More products