Starting from:

$35

COMP5421- Assignment 2 Solved

1              Objectives
In this assignment, you will redo Assignment 1 without getting involved with dynamic storage management. A major goal in this and the following assignments is to encourage you to leverage the C++ standard and popular libraries to do the work for you so you don’t ever have to reinvent the wheel.

Although assignment 1 can be ideally and conveniently implemented using associative containers such as std::map and std::set, this assignment requires that class IntList be replaced with std::vector<int>, class TList with std::list<Token>, and C-style text processing with std::string.

 
 
 

std::vector<int>
In addition to implementing the bulk of the work in this assignment, the vector<int> and list<Token> classes also provide plenty opportunities for us to learn about container iterators, which we will use in class Tokenizer to iterate through and modify the container elements.

Class Tokenizer provides the functionalities of A1’s TList class that are not directly supported by std::list<Token>, plus a few more functionalities.

2              String tokens
In assignment 1, a string token is defined as a sequence of contiguous characters excluding white-space characters such as space, tab, and newline characters.

Generalizing that definition in this assignment, we define a string token to be a sequence of contiguous characters excluding those specified in a user defined set of separator characters.

The separator characters are those that we do not want in a token in this assignment, and they are represented using a std::string object. For example, the separators in assignment 1 can be defined as "\n\t\0 "; that is, the new line, tab, null, and blank characters separate tokens in a text.

3              Class TokenizerApp
TokenizerApp is the name of a C++ file that contains the main function, and possibly other functions, to test drive the functionality of a Tokenizer object. It is a simple menu-driven program that displays a menu of options for the user to choose from:

Enter the name of an input file of text: input text file.txt Enter the seperator characters: ;. ,?!"=’:

Menu

======

A - Print all input lines

P - Print indexed tokens

F - Print tokens sorted on frequency

L - Print tokens sorted on length

S - Search

X - Exit

Enter your choice:
1

2

3

4

5

6

7

8

9

10

11

12

3.1            Option A (or a)
Prints all of the input lines:

Enter your choice: a

1: Do you like green eggs and ham?

2:

3: I do not like them, Sam-I-am.

4: I do not like green eggs and ham!

5:

6: Would you like them here or there?

7:

8: I would not like them here or there.

9: I would not like them anywhere.

10:

11: I do so like green eggs and ham!

12: Thank you! Thank you, 13: Sam-I-am!

Menu

======

A - Print all input lines

P - Print indexed tokens

F - Print tokens sorted on frequency

L - Print tokens sorted on length

S - Search
12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

X - Exit

Enter your choice:
35

36

3.2            Option P (or p)
Prints all tokens in alphabetic order, similar to that in assignment 1, with one noticeable difference: the numbers inside the parentheses, each indicating the frequency of the corresponding token in the input file.

Enter your choice: p

and (3) 1 4 11

anywhere (1) 9

Do (4) 1 3 4 11 eggs (3) 1 4 11 green (3) 1 4 11 ham (3) 1 4 11

here (2) 6 8

I (5) 3 4 8 9 11 like (7) 1 3 4 6 8 9 11

not (4) 3 4 8 9

or (2) 6 8

Sam-I-am (2) 3 13 so (1) 11

Thank (2) 12 them (4) 3 6 8 9

there (2) 6 8

Would (3) 6 8 9 you (4) 1 6 12

Menu

======

A - Print all input lines

P - Print indexed tokens

F - Print tokens sorted on frequency

L - Print tokens sorted on length

S - Search

X - Exit

Enter your choice:
36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

Notice that the user supplied separator characters ;. ,?!"=’: defines a very limited number of separators, allowing, for example, the strings 123 and }1+2*3{ as tokens. For a C++ source file, the user might input something like ;. ,?!"=’:|{}[]()&+-*%$#!~>^</\ for the separator characters.

3.3            Option F (or f)
Prints the tokens sorted in the ascending order of their frequencies in the input file:

Enter your choice: F

anywhere (1) 9 so (1) 11

here (2) 6 8 or (2) 6 8

Sam-I-am (2) 3 13 Thank (2) 12 there (2) 6 8

and (3) 1 4 11 eggs (3) 1 4 11 green (3) 1 4 11 ham (3) 1 4 11

Would (3) 6 8 9

Do (4) 1 3 4 11 not (4) 3 4 8 9 them (4) 3 6 8 9

you (4) 1 6 12

I (5) 3 4 8 9 11 like (7) 1 3 4 6 8 9 11

Menu

======

A - Print all input lines

P - Print indexed tokens

F - Print tokens sorted on frequency

L - Print tokens sorted on length

S - Search

X - Exit

Enter your choice:
66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

Knowing that sorting a linked list in general is not a trivial task, let alone sorting the list efficiently, we are happy to know that std::list<Token> can sort the list for us if we tell it how to compare two tokens.

See here for a quick example.

3.4            Option L (or L)
Prints the tokens sorted on the length of each string tokens, with tokens of equal lengths sorted alphabetically.

Enter your choice: l

I (5) 3 4 8 9 11 Do (4) 1 3 4 11 or (2) 6 8 so (1) 11 and (3) 1 4 11 ham (3) 1 4 11 not (4) 3 4 8 9 you (4) 1 6 12 eggs (3) 1 4 11 here (2) 6 8 like (7) 1 3 4 6 8 9 11 them (4) 3 6 8 9

Thank (2) 12 Would (3) 6 8 9 green (3) 1 4 11 there (2) 6 8 Sam-I-am (2) 3 13 anywhere (1) 9

Menu

======

A - Print all input lines

P - Print indexed tokens

F - Print tokens sorted on frequency

L - Print tokens sorted on length

S - Search

X - Exit

Enter your choice:
95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

Again, the std::list<Token> class can sort the list for us as long as we tell it how to compare two tokens.

3.5            Option P (or p), Again
Let us note that printing the tokens sorted on some attributes must not disturb the order of the tokens in the original indexed list. In other words, selecting option P now should print the original indexed list:

Enter your choice: p

and (3) 1 4 11

anywhere (1) 9

Do (4) 1 3 4 11 eggs (3) 1 4 11 green (3) 1 4 11 ham (3) 1 4 11

here (2) 6 8

I (5) 3 4 8 9 11 like (7) 1 3 4 6 8 9 11

not (4) 3 4 8 9

or (2) 6 8

Sam-I-am (2) 3 13 so (1) 11

Thank (2) 12 them (4) 3 6 8 9

there (2) 6 8

Would (3) 6 8 9 you (4) 1 6 12

Menu

======

A - Print all input lines

P - Print indexed tokens

F - Print tokens sorted on frequency

L - Print tokens sorted on length

S - Search

X - Exit

Enter your choice:
124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

3.6            Option S (or s)
Prompts the user for a token to search for; if found, prints the input lines on which the token appears; otherwise, displays the message "token not found".

Enter your choice: s

Enter the text to search for: you

1: Do you like green eggs and ham?

6: Would you like them here or there?

12: Thank you! Thank you,

Menu

======

A - Print all input lines

P - Print indexed tokens

F - Print tokens sorted on frequency

L - Print tokens sorted on length

S - Search

X - Exit

Enter your choice:
154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

3.7            Option X (or x)
Enter your choice: x

Thank you for trying my program.

Goodbye.
170

171

172

173

4              Class Token
4.1            Representation
1

2

3

4

class Token

{ private:

             string theText{};                                                      // the text of this token

vector<size_t> theLineNumbers{}; // this token’s list of (non-negative) line size_t theFrequency{1}; // the frequency of this token in the input
numbers
file
We choose size t over int as the type of the line numbers, because in this application we only deal non-negative line numbers and size t represents non-negative (unsigned) integers.

The size of theLineNumbers is at least 1, because a string token must reside on some line in the input file, and every input line has a number ≥1.

Note that the size of theLineNumbers may not necessarily represt the frequency because a string token can appear multiple times on a single input line.

4.2            Normal constructors
This is the only normal (non-special) constructor of the class; it sets the frequency to 1, and initializes the text and line number of the token being constructed with the corresponding supplied argument values.

Token(string text, size_t linenum);
// a normal constructor
7

4.3            Default constructor
Since a token cannot exist without a text and associated line number, we decide to disallow default construction; so we make our decision visible to both the human reader as well as the compiler by explicitly declaring the default constructor deleted.

public:

Token()
= delete; // disable default construction
8

9

Note that disabling the default constructor of a class, in turn, disables creating an array of objects of that class. That is fine in this application, but it may not be an option in another application, forcing us to define a default Token object.

4.4            Other Special Member Functions: The Big Five
Modern C++ programming style encourages that these members be explicitly either defined, deleted, or defaulted. Since Token is not involved directly in dynamic resource allocation, the compiler-generated versions of these members are most appropriate for Token objects.

// the big five: three choices (either default, delete, or define);

// avoid relying on implicit generation of special member functions

Token& operator=(const Token& rhs) = default; // copy op=

Token& operator=(Token&& rhs)                               = default; // move op=

Token(const Token& source)                                          = default; // copy ctor

Token(Token&& source)                                                  = default; // move ctor

~Token()                                                                                = default; // dtor
10

11

12

13

14

15

16

17

Consequently,

copy/move constructors copy/move the source token’s theFrequency, theText, and theLineNumbers members to theFrequency, theText, and theLineNumbers of the Token object being constructed. Note that these data members in turn propagate and apply copy/move construction internally to their respective members, recursively.

copy/move assignments copy/move-assign the right-hand side token’s theFrequency, theText, and theLineNumbers members to the left-hand side token theFrequency, theText, and theLineNumbers members. Note that the data members in turn propagate and apply the copy/move assignment internally to their respective members, recursively.

4.5            Comparison operations
In this assignment we use case insensitive comparison to compare the text of the tokens.

// comparison member function (returns -1, 0, +1, as in A1)

int compareIgnoreCase(const Token& t) const;                                              // case insensitive comparison
18

19

20

See Here for an example of a case insensitive comparison function.

4.6            Other Self-Explanatory Operations
21

22

23

24

25

26

27

28

// getter members; each is doubly safe!

// since each is const, the invoking object remains intact, and,

// returning by value, each adheres to the principle of information hiding

string             getTheText()      const; vector<size_t> getTheLineNumberList() const; size_t             getFrequency()        const;

          size_t                                getLineNumber(size_t = 1) const; // line number is 1-based

// to provide user friendly

void pushBackLineNumber(size_t lineNum); // append the suppled line number void print(ostream& sout) const; // print this token to sout

}; #endif ostream& operator<<(ostream& sout, const Token& arr);
29                                                                                                                                                                                                                                                                                                            interface

30

31

32

33

34

35

5              Class Tokenizer
A Tokenizer object provides a minimal set of services, allowing a typical menu-driven program to produce an interactive session as shown above.

5.1            Representation
class Tokenizer

{ private:

const string theSeparators;
// the separator characters in a std::string
list<Token> theTokenList;
// the list of tokens managed by this tokenizer
vector<string> theLines;
// the lines in the input file
1

2

3

4

5

6

5.2            Default constructor
Since a Tokenizer cannot exist without an input file of text, we decide to disallow default construction; so we make our decision visible to both the human reader as well as the compiler by explicitly declaring the default constructor deleted.

public:

Tokenizer()
= delete;
// disable default constructor
7

8

5.3            Normal constructor
Supplied with the name of an input file of text and with a string of separator characters, this constructor extracts the lines from the input file one line at at a time, delegating the process of extracting the tokens in a line and keeping track of the associated line numbers to ProcessTokensInLine(text, line number), a private facilitator member function.

Tokenizer(const string& filename, const string& separators);
9

5.4            Other Special Member Functions: The Big Five
Modern C++ programming style encourages that these members be explicitly either defined, deleted, or defaulted. Since our class is not involved directly in dynamic resource allocation, the compiler-generated versions of these members are most appropriate to use.

~Tokenizer()                 = default;                  // dtor

Tokenizer(const Tokenizer&)                           = default; // copy ctor

Tokenizer(Tokenizer&&)                                   = default; // move ctor

Tokenizer& operator=(const Tokenizer&) = default; // copy op=

Tokenizer& operator=(Tokenizer&&)                                = default; // move op=
10

11

12

13

14

5.5            Private Facilitators
private:

void ProcessTokensInLine(const string& line, size_t linenum);
5.5.1        15
16

Outsources the process of extracting the tokens in line and inserting them each into the token list to two private facilitator (helper) member functions:

To extracts the tokens from line, it delegates to member function splitLineIntoTokens, which in turn returns the extracted tokens in a std::vector<string>, say, vec.
To turn the string tokens in vec into Token objects and to store the resulting objects in the token list, it delegates to the member function insert, one vec element at time.
vector<string> splitLineIntoTokens(const string& line) const;
5.5.2        17
Using the separator characters, it splits the given line of text into string tokens, storing and returning them all in a std::vector of std::strings.

Requirement Your implementation of this member function may only use the std::string class, which offers a useful set of string operations, including substr, find first of, find first not of, and a few more.

void insert(string text, size_t linenum);
5.5.3        18
This function is equivalent to the addSorted member function in the TList class in A1.

Specifically, it first creates a Token object using the supplied text and line number; it then compares that object against the Token objects in the token list, which are already sorted in ascending order; if found, it updates the number list of that object; otherwise, it inserts the newly created Token object into the token list.

Requirement To scan and to insert into the token list, your implementation of this member function must use list<Token>::iterators.

vector<size_t> search(const string& str)const;
5.5.4        19
Searches the token list for a token object whose text is equal (case insensitive) to that of the given string str. If found, it returns a copy of that object’s line number list; otherwise, it returns an empty number list.

void printSomeInputLines(const vector<size_t>& vec)const;
5.5.5        20
Given a std::vector of line numbers, prints the input lines corresponding to those line numbers.

5.6            Public Interface
public:

void searchAndPrint(string& str)const;
5.6.1        21
22

Searches the token list for a token object whose text is the same as the given string str; if found, it prints the input lines that contain str.

public:

void printAllInputLines()const;
5.6.2        23
24

Prints all input lines.

public:

void print(ostream& sout)const;
5.6.3        25
26

Prints the token list to the given output stream sout.

public:

void sortOnFrequecy()const;
5.6.4        27
28

Prints the token list sorted on frequencies of the tokens in the input file.

Requirements
Since the token list must remain intact, we need to copy it into another linear sequence and then sort that linear sequence.
Our options here are limited to std::list, std::vector, and std::forward list.

Familiar with std::list and std::vector, we take advantage of this opportunity to use std::forward list, which already comes equipped with a sort member fuction ( std::vector does not).

This member function must use std::forward list’s sort
(Hint: simply implement the compareFrequency function given in 5.7.3 below and pass it to std::forward list’s sort member function as the argument.)

void sortOnTokenLength()const;

};
5.6.5        29
30

Prints the token list sorted on the text of the tokens.. Requirements

Similar to 5.6.4 above, copy the token list into a std::forward list and then sort that forward list using std::forward list’s own sort member function.
This member function must use std::forward list’s sort
(Hint: simply implement the compareLength function given in 5.7.2 below and pass it to std::forward list’s sort member function.)

5.7            Free (top-level) helper functions
ostream& operator<<(ostream&, const Tokenizer&);
5.7.1        31
Writes a given token to a given stream

bool compareLength(const Token& t1, const Token& t2);
5.7.2        32
Determines whether t1 is less than t2, comparing their lengths; if they are of equal length, then determines whether t1 is less than t2, alphabetically (using std::string’s operator<).

bool compareFrequency(const Token& t1, const Token& t2);
5.7.3        33
Comparing the frequencies of t1 and t2, determines whether t1 is less than t2.

More products