$30
You are going to complete a program that sorts the nodes of a linked list. The program will load data from an input file specified by the user and create a linked list of pointers to data objects using the provided C++ list class. Each data object will consist of three C++ string fields which store the last name, first name, and social security number (ssn) of a hypothetical person. The last names have been chosen randomly from a set of the 500 most common last names according to the 2010 U.S. census. The first names have been chosen randomly from a set of the 250 most common male names and the 250 most common female names in the U.S. between 2010 and 2017. (It shouldn't really matter if the data is accurate; I found it on-line, and I have posted text files containing the last names and first names used for the assignment.) The ssns have been randomly generated and have the format ddd-dd-dddd, where each character 'd' represents a digit. (Within each dataset, ssns are guaranteed to be unique.)
After creating the list, the program will sort the list. Items should be sorted according to last names; if last names are identical, they should be sorted according to first names; if both last and first names are identical, they should be sorted according to ssns (which are guaranteed to be unique). The sorted list will then be written to an output file. The input and output files will have the same format. The first row will be an integer indicating how many rows follow. Each row after that represents a single data object, including a last name, first name, and ssn, separated by single spaces. There will be no leading or trailing whitespace, and each row will be followed by a Unix-style newline character ('\n').
I am providing you with code that handles most aspects of the program, and you may not make any changes to the provided code or its behavior (this will be explained further in class). The provided code includes the implementation of a simple class to store the data objects, the file loading routine that loads the data from an input file, and the file saving routine that writes the sorted data to an output file. There is also a call to a sort routine that you must fill in. You may also add additional functions, additional class definitions, or additional global variables, if you wish, but all the added code must be included below a specific comment which indicates that the code above the comment may not change. (You should even be able to include additional provided header files below the comment if you wish. If you want to do this, but your compiler does not support it, then include them at the top of the file, and mention this in your e-mail when you submit the program.) I may use the "diff" command to make sure you have not changed any code above the comment.
Your program will be tested on four test cases, which we will call T1…T4. For each of these test cases, a single file will exist using the format specified earlier; i.e., the first row will indicate the number of rows to follow, and each additional row will contain three strings representing a person's last name, first name, and ssn. The contents of the four test cases will also adhere to the following specifications:
· T1 will contain approximately (within 1 percent of) 100,000 data objects. Each data object will store a randomly selected last name, a randomly selected first name, and a randomly generated social security number (ssns are guaranteed to be unique within T1).
· T2 will contain approximately (within 1 percent of) 1,000,000 data objects. Each data object will store a randomly selected last name, a randomly selected first name, and a randomly generated social security number (ssns are guaranteed to be unique within T2).
· T3 will contain the same data objects as T2, but they have already been sorted according to last names and first names (but not according to social security numbers).
· T4 will contain approximately (within 1 percent of) 1,000,000 data objects. Each data object will store the same last name and the same first name (both selected randomly, but only once), and a randomly generated social security number (ssns are guaranteed to be unique within T4).
Every working program will be assigned a score that is based on the CPU times that the program takes to sort the four test cases. If time1…time4 are the CPU times required by the program when tested on T1 through T4, respectively, the overall score for the program will be time1 + time2 + time3 + time4 (i.e., the sum of the four times). Assuming that the program works (i.e., it generates the correct output for all four test cases), your program will be graded almost entirely based on this overall score just described. I expect working programs; penalties for non-working programs will be discussed in class. I reserve the right to take off a few points for poor indentation (which does not affect speed, since it is ignored by the compiler), but otherwise, you will not lose points for lack of elegance. Almost anything goes, as long as your write the program individually and do not violate any of the rules described here or in class. You may use any standard C++ classes or routines to which you have access, including the provided sort member function of the C++ list class (which provides an implementation of merge sort) or the provided sort function from the C++ algorithm library (which probably provides an implementation of a modified quicksort).