Starting from:

$30

CIS-STA9665-Writing Structured Programs Solved

Chapter 4. Writing Structured Programs 

 

1.                   Create a list of words = ['is', 'it', 'good', '?']. a) Use a series of assignment statements (e.g. words[1] = words[2]) and a temporary variable tmp to transform this list into the list ['it', 'is', 'good', '!']. b) Now do the same transformation using tuple assignment.

 

2.                   Write code that removes whitespace at the beginning and end of a string ('   this   is   a   sample   sentence    '), and normalizes whitespace between words to be a single space character. a) do this task using split() and join()

b) do this task using regular expression substitutions

 

3.  sent1= ['The', 'dog', 'gave', 'John', 'the', 'newspaper']. Now assign sent2=sent1. Modify sent1[1]='monkey'.

Please review section 4.1 -Assignment in Chapter 4 to answer the following questions: a) verify that sent2 has changed

b)  Now try the same exercise but instead assign sent2=sent1[:]. Modify sent1[1]='monkey' and see what happens to sent2. Explain.

c)   Now define text1=[['The', 'dog', 'gave', 'John', 'the', 'newspaper'], ['John', 'is', 'happy']]. Now assign text2=text1[:], assign a new value to one of the words (text1[0][1]='monkey'). Check what happens to text2.

Explain.

d)  Extract successive overlapping 4-grams from ['The', 'dog', 'gave', 'John', 'the', 'newspaper'].

 

4.      Write a function that prints any word that appeared in the last 20% of a text that had not been encountered earlier. Use text1 from nltk.book to call this function.

 

 

5.      Write a program that takes the sentence ("we have seen two kinds of two sequence objects") expressed as a single string, splits it and counts up the tokens. Get it to print out each token and the token's frequency, one per line, in alphabetical order. You should write a function and call that function to process the sentence.

 

6.      Write a function shorten(word, n) to process a text (“big big big world today tomorrow good Today good”), omitting the n most frequently occurring words of the text. You should use w.lower() to normalize the text first. Please call this shorten function. 

 

7.      Write a list comprehension that sorts a list of WordNet synsets for proximity to a given synset. For example, given the synsets lesser_rorqual.n.01, killer_whale.n.01, novel.n.01, and tortoise.n.01, sort them according to their shortest_path_distance() from right_whale.n.01. You should use lambda in the sorted() function.

 

8.      Write a function that takes a list of words (containing duplicates) (i.e. words=['table','chair','desk','table','table','chair']) and returns a list of words (with no duplicates) sorted by decreasing frequency. E.g. if the input list contained 10 instances of the word table and 9 instances of the word chair, then table would appear before chair in the output list. You should use lambda in the sorted( ) function.

 

9.      Write a function that takes a text (e.g. text3 from nltk.book) and a vocabulary (e.g. nltk.corpus.words.words()) as its arguments and returns the set of words that appear in the text but not in the vocabulary. Both arguments can be represented as lists of strings. Can you do this in a single line, using set.difference()?

 

10.   Choose your own webpage (in html format) and output the 20 most common words in this web page.

You should use w.lower() to normalize the text. Please get rid of stop words, numbers, and punctuations.

Please define a function and call that function.

More products