Starting from:

$25

COSC122 - Introduction to Data Structures and Algorithms - Lab 3 - Linked Lists - Solved

Goals
This lab will provide you with some practice with Linked Lists. In this lab you will:

•   implement stack and queue data structures using linked lists and

•   work with a linked list structure for counting characters

Preparation
•   You should be familiar with the Linked List material in Chapter 3 of the textbook before attempting this lab (online link here).

•   We’ll be using stacks and queues again, so make sure you are familiar with them from the previous topic.

Implementing a Stack and a Queue
The provided linked_list_structures module contains two skeleton classes: Stack and Queue that use a linked list to implement the interfaces for working with a stack and a queue. However, they’re missing implementations for the most important methods: push, pop, peek, enqueue and dequeue!

Remember that a stack is ‘last-in–first-out’ (LIFO), and a Queue is ‘first-in–first-out’ (FIFO). The stack implementation should push and pop items from the head of the list; with the queue implementation it works out easiest if items are enqueued at the tail and dequeued from the head of the list. Your initial task is to:

•   Complete the missing implementations for both the Stack and Queue classes in the linked_list_structures module.

–   The Stack and Queue must be implemented as a linked lists of Node objects and a Node class is defined at the start of the linked_list_structures module—check it out now.

–   The doctests for the Stack and Queue class not only check that you are using linked lists, they give you some indication of how to use the linked lists.

–   The head of the linked list will effectively be the top of a stack or the front of a queue.

–   Each Node holds an item and a pointer to the next Node object (which will be None if there is nothing after it).

•   Implement the __len__ methods so that they return the number of nodes in the stack/queue. Remember Python will interpret len(q) as q.__len__(), ie, so we don’t need to call q.__len__() directly when using our stacks/queues.

It is important to test the extreme cases such as deleting from a list of 1 item, adding to a list of 0 items etc. Most of these cases are included in the doc tests (but not all). When you have completed the Stack and Queue classes you can test them with the doctests (by running the file in Wing) and/or by manually experimenting with the classes, for example:

>>> from linked_list_structures import Stack

>>> s = Stack() >>> s.push('a')

>>> print(s)

Stack: head/top -> a -> None

>>> s.pop() 'a'

>>> print(s)

Stack: head/top -> None

>>> s.push('b')

>>> print(s)

Stack: head/top -> b -> None

>>> s.push('c')

>>> len(s)

2

>>> s.peek()

'c'

>>> print(s)

Stack: head/top -> c -> b -> None

>>> s.pop()

>>> from linked_list_structures import Queue

>>> q = Queue()

>>> q.enqueue('a')

>>> print(q)

Queue: head/front -> a -> None

>>> len(q)

1

>>> q.enqueue('b')

>>> print(q)

Queue: head/front -> a -> b -> None

>>> q.enqueue('c')

>>> print(q)

Queue: head/front -> a -> b -> c -> None

>>> len(q)

3

>>> q.dequeue() 'a'

>>> print(q)

Queue: head/front -> b -> c -> None

'c'

> Once you have implemented these, complete Linkystacksandqueues questions in Lab Quiz 3.

Now that you have working stack and queue data structures, you should make the enqueue method faster by adding a tail pointer, pointing to the end of the list.

•   Open the module queue2.py.

•   Complete the queue class so that it uses both head and tail pointers. You should be able to copy most of your code over from your previous queue but you will need to re-write the enqueue and dequeue methods—we have supplied new doctests that will check your code deals with the head and tail pointers appropriately.

> When you have this new version worksing, complete the HeadsandTails questions in Lab Quiz 3.

Calculating Letter Frequencies
The next task is to read a text corpus (collection of text) and produce the statistics counting the number of times each character (or pair of characters) appeared in the text using a linked list.

To see applications of this type of analysis, check out Frequency Analysis on Wikipedia. [1] Frequency information can be used to help with predictive text on cellphones, email completion in your email client, etc. For example:

•   If someone types 't' what is the most likely next character?

•   What is the most likely character after 'ther' has been typed?

FreqList overview
Each FreqNode in the list will store the following values:

1.    an item (such as a character or pair of characters), and

2.    a frequency — the number of times the item has been seen in the text.

3.    a next_node pointer that points to the next FreqNode in the list.

Important Notes:

1.    Initially the linked list is empty (the head points to None).

2.    We will start by doing frequencies for characters but will move on to calculating frequencies for pairs of characters, so your methods should not assume we are always using single characters.

FreqList

Instance Variables:Instance Variables:

head:head:

                                                                                        FreqNode              FreqNode

                                                                                 item: 'h'item: 'h'     item: 't'item: 't'

                        MethodsMethods              frequency: 1frequency: 1 frequency: 1frequency: 1

 __init____init__        next_node:next_node:            next_node:next_node:            None addadd

get_item_frequencyget_item_frequency get_xy_for_plotget_xy_for_plot

__repr____repr__

__len____len__

Figure 1: Simple object model overview

Program output
The output from your program should look like the following (for letter pairs):

1: 't ' = 12345

2: ' e' = 12222 3: 'th' = 12013

4: 'he' = 11987

... etc

•   The first number in each line is the index number of the letter (or group of letters).

•   The last number is the frequency (for example the first line has an index on 1, and shows that the letter pair ’t ’ has a frequency of 12345).

•   The index number shows you the order in which the items are stored in the linked list.

•   The index number also shows the frequency rank in the case of the sorted frequency list. The example output is taken from a sorted frequency list and therefore the most frequent pair was 't ' and the second most frequent was ' e'.

•   No distinction should be made between UPPERCASE and lowercase letters.

Counting the frequencies—overview
The letter_frequencies module provides some skeleton code. For each item in the text, your program should:

• check whether it already exists in the list.

–   If so, increment its frequency count by 1.

–   If not then add a node to the linked list (initially add at the start of the list as this will be the quickest way to add the node) for that item and set its frequency count to 1.

Three sample text files are given in le_rire.txt, ulysses.txt, and war_and_peace.txt - listed from smallest to largest file-size. Tolstoy’s War and Peace is one of the longest novels ever written, weighing in at around half a million words, but it is less than half the size of Proust’s In Search of Lost Time, which has around 1.2 million words [2].

Single Character Frequencies
You are required to:

•   Complete the add method in the UnsortedFreqList class in the letter_frequencies module.

•   Run it on all the sample text files to provide the counts for each letter in each corpus (eg, to run on le_rire.txt you should uncomment run_tests("le_rire.txt", verbose=True) in the main function and, if needed, un-comment the line in the run_tests function that adds the UnsortedFreqList test for single characters to the list of tests to run, ie, the run_settings.append((UnsortedFreqList, 1, verbose))).

We recommend testing your programs fully with le_rire.txt before you move on to the longer files. You can ignore the doctest errors for other functions at this stage. Your output from a single character run, with UnsortedFreqList, should give you something like the left column of the following table.

--------------------------------------------------

Tests for: le_rire.txt

Doc size: 246941 chars

-------------------------------------------------1 char(s) -> t = 0.8862s (40 items)

Unsorted Frequency List

-----------------------------------

1: '@' = 2

2: '%' = 1

3: '/' = 26

4: '!' = 43

5: '"' = 262

6: ';' = 159 7: '?' = 89

8: 'q' = 194

9: 'x' = 387

10: 'z' = 24

11: ''' = 112

12: '*' = 28

13: '#' = 1

14: '.' = 1851

15: 'v' = 1897

16: 'd' = 6114

17: 'w' = 3824

18: ',' = 3178

19: 'i' = 15917

20: 'm' = 5441

21: 'y' = 3461

22: 's' = 13342 23: ':' = 181

24: 'a' = 15336

25: 'l' = 7836

26: 'f' = 4953

27: 'k' = 1001

28: 'b' = 2846

29: 'n' = 13878

30: 'u' = 5545

31: 'g' = 4055

32: 'c' = 6811

33: 'j' = 294

34: 'o' = 15142

35: 'r' = 11188 36: 'p' = 3679

37: ' ' = 42837 38: 'e' = 24941

39: 'h' = 10597

40: 't' = 19468
--------------------------------------------------

Tests for: le_rire.txt

Doc size: 246941 chars

-------------------------------------------------1 char(s) -> t = 0.4403s (40 items)

Nicer Unsorted Frequency List

-----------------------------------

1: 't' = 19468

2: 'h' = 10597

3: 'e' = 24941

4: ' ' = 42837 5: 'p' = 3679

6: 'r' = 11188

7: 'o' = 15142

8: 'j' = 294

9: 'c' = 6811 10: 'g' = 4055

11: 'u' = 5545

12: 'n' = 13878 13: 'b' = 2846

14: 'k' = 1001

15: 'f' = 4953

16: 'l' = 7836

17: 'a' = 15336 18: ':' = 181

19: 's' = 13342 20: 'y' = 3461

21: 'm' = 5441

22: 'i' = 15917 23: ',' = 3178

24: 'w' = 3824

25: 'd' = 6114

26: 'v' = 1897

27: '.' = 1851

28: '#' = 1

29: '*' = 28

30: ''' = 112 31: 'z' = 24

32: 'x' = 387

33: 'q' = 194 34: '?' = 89

35: ';' = 159

36: '"' = 262 37: '!' = 43

38: '/' = 26 39: '%' = 1

40: '@' = 2
|
Your next challenge is now to:

•   Implement the add method in the NicerUnsortedFreqList class. This class should add new items to the end of the list rather than the beginning. Adding an item to the end of the list will take longer than adding to the start of the list but, as the name suggests, this works better overall. See the right column of the above output.

•   Compare the speed of this nicer method to the speed of the original method.

•   Think about how many times new letters will be added to the frequency list in the course of processing a document and the position of characters such as 't', 'h' and 'e' in the frequency list. Explain why inserting new letters at the end of the list vs. the beginning is likely to speed up corpus processing.

> After implementing the new add method, complete the Simple character frequencies questions in Lab Quiz 3.

Character Pair Frequencies
Now that you have your code running for single characters, do the following:

•   Run it for character pairs

–   for example by un-commenting run_settings.append((UnsortedFreqList, 2, verbose))

–   There should be a more noticeable gap between the UnsortedFreqList and the NicerUnSortedFreqList

.

•   Complete the SortedFreqList to calculate the letter frequencies and store the nodes from highest frequency count to lowest.

–   This is usually faster because the majority of characters being incremented will be near the front of the list. To keep the list in order you will need to move nodes to their sorted position when needed.

–   If a node has its count incremented and its count is now greater than the node before it then the node needs to be moved. It will obviously need to be before the previous node but it may need to go further as their may be many nodes with the same frequency as the previous node.

–   The way we recommend moving an updated node to its correct position by ’unlinking the node from the list, keeping a pointer to it. Then inserting it back into the list in the correct position.

–   IMPORTANT NOTE: we supply the _insert_in_order method that will help with this approach. Please read the comments in the add and _insert_in_order methods so you understand how and when to use the _insert_in_order, for example, you will never need to use _insert_in_order on an empty list as a node can’t have higher frequency than its previous node if the list is

empty.

•   Compare the timings of your SortedFreqList, UnsortedFreqList and NicerUnsortedFreqList implementations.

–   Why is SortedFreqList only slightly faster than the NicerUnsortedFreqList?

> After implementing this, complete the SortedFrequencies questions in web quiz 3.

Extras
•   To get a graphical representation of the frequencies use the plot_freq_list function provided in the source file. Only plot one graph per run - python will give you grief if you try more. We recommend only graphing the single character frequencies as the graph for character pairs will have far too many x-axis items! If you get a crash when running graphs you may need to restart the python shell in Wing, to do this simply click on the options button in the shell window and select restart shell...

•   Implement a FrequencyDictionary that stores frequencies in a Python dictionary and see how well it stacks up versus the SortedFreqList. The timings should show why this the way most people would build the frequency values in the real-world.

•   Write a function such as make_sorted_words_freq_list to calculate the frequencies of words in the documents. For simplicity assume a word is any set of characters between spaces, eg, in 'Wiggle and Woggle are two, very strange, dances.' we would consider 'two,' and 'strange,' to be words, along with the more obvious words such as Wiggle and are. Basically using .split() will return the list of words that you want. But you may be able to process the files quicker if you scan through, character by character, building words and adding to the freq_list as you go—this saves building a huge list of words before you start processing.

•   Another approach to implementing the SortedFreqList is to use a doubly linked list and move back through the list until the appropriate spot is found and insert the node there—this should be slightly quicker when moving nodes as it won’t have to go all the way back to the start of the list each time. But, you will need to rewrite your nodes to have two links and make sure that both links are updated properly when operating on lists of them. Making a version that uses this approach is a good exercise for more motivated students. Take a copy of your code and change your implementation to use a doubly linked list. Note: don’t submit this code to the quiz question - use the singly linked list implementation (the quiz server will provide the Node class as given in the original lab code...).


 
[1] http://en.wikipedia.org/wiki/Frequency_analysis
[2] It’s too bad that In Search of Lost Time is not available on project Gutenberg.

More products