Purpose: The purpose of this assignment is to allow you practice ArrayList, Interfaces, Generics and Linked Lists.
Part 1: Interfaces, Generics & ArrayList
In 1935, the American linguist George Zipf noticed something very peculiar with the books he was reading... whenever he would count the words in his books, he noticed that most of the words appeared only once and a small number of words appeared very frequently. In addition, this phenomenon seemed to hold true for any text and in any language. Zipf analysed this a little further, and noticed that if you sort the words by frequency and plot the frequency of words versus their rank in the sorted list, you will always get a graph similar to the one below, where a few words have a high frequency (the peak on the left), and most words appear only once (the long tail on the right).
In technical terms:
- Words that appear frequently (the peak on the left) and that happen to be short are called stop-words.
- Words that only appear once (the tail on the right) are called happax legomena.
Today, this behaviour of language is known as Zipf’s law and is used tremendously in the Search Engine industry. In fact, it turns out that Zipf’s law is also applicable to many other domains and applications such as monitoring traffic (on the Internet or on the highway), predicting financial activities, and even analysing animal behaviour...
Write a program to verify Zipf’s law with your favourite text. Specifically, your program must ask the user for an input text file (please handle potential I/O errors properly), and count how many words the file contains and display the following statistics:
- Display each word in the text along with its rank and frequency.
You can assume that a word is defined as anything that the method Scanner.next() returns and only contains letters. For example “U2”, “data-base” and “hi!” should not be counted as words.
Your program does not have to be case-sensitive. For example, the words “hi” and “Hi” can be considered as same words.
The list of rank/word/frequency must be displayed in descending order of frequency, and all words with the same frequency must be displayed in alphabetical order (uppercases before lowercases).
- Display the total number of word tokens and word types.
The number of word tokens refers to the total number of words in the text (the number of times the method Scanner.next() returned a String); whereas the number of word types refers to the number of different words in the text. For example, if the word “the” appears 30 times, it will count as 30 word tokens, but only 1 word type.
- Display statistics on the happax legomena:
§ the number of happax,
§ the percentage of happax (nb happax ÷ nb of word types), and
§ how many of the tokens in the text they account for (nb happax ÷ nb of word tokens).
- Display statistics on the stops words:
§ the number of stop words,
§ the percentage of stop words (nb stop words ÷ nb of word types), and
§ how many of the tokens in the text they account for (total frequency of stop stops ÷ nb of word tokens). For this assignment, you can assume that a stop word is a word that has a length of 4 characters or less and that appears at least 10 times in the text.
Here is an example of how your program should behave (the file jokes.txt is available with this assignment):
Enter input file:
------------------------------------
RANK FREQ
------------------------------------
1 17
2 10
3 7
4 7
5 5
6 5
7 4
8 4
9 4
10 4
11 4
12 4
13 4
14 3
...
34 1
35 1
36 1
...
51 1
52 1
53 1
54 1
55 1
...
174 1
175 1
176 1
177 1
178 1
---------------------------------
Nb of word tokens: 268
Nb of word types: 178
Nb of Happax: 145
% of Happax: 81%
Happax account for: 54% of the text
Nb of stop words: 6
% of stop words: 3%
jokes.txt
Bakers
Count
about
backward baseball batteries
wondered write writes
---
WORD
a A in was the you The When
are is of who your and
Did
all
why
will
Stop words account for: 19% of the text
Part 2: Linked Lists
Write a program to manipulate Cellular Phones using linked lists.
I) The CellPhone class has the following attributes: a serialNum (long type), a brand (String type), a year (int type, which indicates manufacturing year) and a price (double type). It is assumed that brand name is always recorded as a single word (i.e. Motorola, SonyEricsson, Panasonic, etc.). It is also assumed that all cellular phones follow one system of assigning serial numbers, regardless of their different brands, so no two cell phones may have the same serial number.
You are required to write the implementation of the CellPhone class. Beside the usual mutator and accessor methods (i.e. getPrice(), setYear()) the class must have the following:
(a) Parameterized constructor that accepts four values and initializes serialNum, brand, year and price to these passed values;
(b) Copy constructor, which takes two parameters, a CellPhone object and a long value. The newly created object will be assigned all the attributes of the passed object, with the exception of the serial number. serialNum is assigned the value passed as the second parameter to the constructor. It is always assumed that this value will correspond to the unique serial number rule;
(c) clone() method. This method will prompt the user to enter a new serial number, then creates and returns a clone of the calling object with the exception of the serial number, which is assigned the value entered by the user;
(d) Additionally, the class should have a toString() and an equals() methods. Two cell phones are equal if they have the same attributes, with the exception of the serial number, which could be different.
II) The file Cell_Info.txt, which one of its versions is provided with this assignment, has the information of various cell phone objects. The file may have zero or more records. The information stored in this file is always assumed to be correct and following the unique serial number rule. A snapshot of the contents of the Cell_info.txt file is shown in Figure 1 below.
Figure 1: Cell_info.txt
III) The CellList class has the following:
(a) An inner class called CellNode. This class has the following:
i. Two private attributes: an object of CellPhone and a pointer to a CellNode object;
ii. A default constructor, which assigns both attributes to null;
iii. A parameterized constructor that accepts two parameters, a CellPhone object and a CellNode object, then initializes the attributes accordingly;
iv. A copy constructor;
v. A clone() method;
vi. Other mutator and accessor methods.
(b) A private attribute called head, which should point to the first node in this list object;
(c) A private attribute called size, which always indicates the current size of the list (how many nodes are in the list); (d) A default constructor, which creates an empty list;
(e) A copy constructor, which accepts a CellList object and creates a copy of it;
(f) A method called addToStart(), which accepts one parameter, an object from CellPhone class. The method then creates a node with that passed object and inserts this node at the head of the list;
(g) A method called insertAtIndex(), which accepts two parameters, an object from CellPhone class, and an integer representing an index. If the index is not valid (a valid index must have a value between 0 and size-1), the method must throw a NoSuchElementException and terminate the program. If the index is valid, then the method creates a node with the passed CellPhone object and inserts this node at the given index. The method must properly handle all special cases;
(h) A method called deleteFromIndex(), which accepts one integer parameter representing an index. Again, if the index is not valid, the method must throw a NoSuchElementException and terminate the program. Otherwise; the node pointed by that index is deleted from the list. The method must properly handle all special cases;
(i) A method called deleteFromStart(), which deletes the first node in the list (the one pointed by head). All special cases must be properly handled.
(j) A method called replaceAtIndex(), which accepts two parameters, an object from CellPhone class, and an integer representing an index. If the index is not valid, the method simply returns; otherwise the object in the node at the passed index is to be replaced by the passed object;
(k) A method called find(), which accepts one parameter of type long representing a serial number. The method then searches the list for a node with a cell phone with that serial number. If such an object is found, then the method returns a pointer to that node where the object is found; otherwise, the method returns null. The method must keep track of how many iterations were made before the search finally finds the phone or concludes that it is not in the list;
(l) A method called contains(), which accepts one parameter of type long representing a serial number. The method returns true if the an object with that serial number is in the list; otherwise, the method returns false;
(m) A method called showContents(), which displays the contents of the list, in a similar fashion to what is shown in Figure 2 below.
(n) A method called equals(), which accepts one parameter of type CellList. The method returns true if the two lists contain similar objects; otherwise the method returns false. Recall that two CellPhone objects are equal if they have the same values with the exception of the serial number, which can, and actually is expected to be, different.
Figure 2: Sample of Displaying the Contents of a CellList
è Finally, here are some general rules that you must consider when implementing the above methods:
- Whenever a node is added or deleted, the list size must be adjusted accordingly;
- All special cases must be handled, whether or not the method description explicitly states that;
- All clone() and copy constructors must perform a deep copy; no shallow copies are allowed;
- If any of your methods allows a privacy leak, you must clearly place a comment at the beginning of the method 1) indicating that this method may result in a privacy leak 2) explaning the reason behind the privacy leak. Please keep in mind that you are not required to implement these proposals;
IV) Now, you are required to write a public class called CellListUtilization. In the main() method, you must do the following:
(a) Create at least two empty lists from the CellList class;
(b) Open the Cell_Info.txt file, and read its contents line by line. Use these records to initialize one of the
CellList objects you created above. You can simply use the addToStart() method to insert the read objects into the list. However, the list should not have any duplicate records, so if the input file has duplicate entries, which is the case in the file provided with the assignment for instance, your code must handle this case so thast each record is inserted in the list only once;
(c) Show the contents of the list you just initialized;
(d) Prompt the user to enter a few serial numbers and search the list that you created from the input file for these values. Make sure to display the number of iterations performed;
(e) Following that, you must create enough objects to test each of the constructors/methods of your classes. The details of this part are left as open to you. You can do whatever you wish as long as your methods are being tested including some of the special cases.