Starting from:

$30

Homework 5 Solved

Assignment – High Frequency Words 


Õ

Please answer the following questions in an IPython Notebook, posted to GitHub. 

 

1.       Choose a corpus of interest. 

2.       How many total unique words are in the corpus?  (Please feel free to define unique words in any interesting, defensible way). 

3.       Taking the most common words, how many unique words represent half of the total words in the corpus? 

4.       Identify the 200 highest frequency words in this corpus. 

5.       Create a graph that shows the relative frequency of these 200 words. 

6.       Does the observed relative frequency of these words follow Zipf’s law?  Explain. 

7.       

 

In what ways do you think the frequency of the words in this corpus differ from “all words in all corpora.” 

Assignment – High Frequency Words                                                                                                                                                      Page 1

More products