Starting from:

$30

CSE320- Serialize_Deserialize Solved

Part 2: Serialized Data Format
Transplant uses a serialized data format that has been designed with some features to help detect whether or not the serialized data being read has been corrupted. The serialized data produced by "transplant" consists of a sequence of records. Each record starts with a fixed-format header, which specifies the type of the record and the total length of the record in bytes (including the header). Some types of records consist only of a header. Other types contain data that immediately follows the header.

Each record header consists of 16 bytes of data, having the following format:

"Magic" (3 bytes): 0x0C, 0x0D, 0xED Type (1 byte) Depth (4 bytes) Size (8 bytes)

The first three bytes are the "magic sequence" of values 0x0C, 0x0D, 0xED. The presence of this sequence in each record helps a program reading the serialized data to detect if it has been corrupted: if a record header is expected, but the magic sequence is not seen, then the program can immediately terminate with an error report. Following the magic sequence is a single byte that specifies the type of the record. The possible record types are listed below. Following the type are four bytes that specify the depth in the tree for this record, as detailed below. The depth is specified as an unsigned 32-bit integer in big-endian format (most significant byte first). Following the type are 8 bytes that specify the size of the record as an unsigned 64-bit integer, also in big-endian format. The size of the record is the total number of bytes comprising the record, including the header bytes as well as any additional data after the header.

There are six different record types:

START_OF_TRANSMISSION (type = 0)

END_OF_TRANSMISSION (type = 1) START_OF_DIRECTORY (type = 2)

END_OF_DIRECTORY (type = 3)

DIRECTORY_ENTRY (type = 4)

FILE_DATA (type = 5)

Serialized data produced by "transplant" always begins with a single START_OF_TRANSMISSION record and ends with a single

END_OF_TRANSMISSION record. These records consist only of a header (and so their size is 16 bytes). The depth fields of these records contain the value 0.

A START_OF_DIRECTORY record indicates the beginning of data corresponding to a subdirectory and an END_OF_DIRECTORY entry indicates the end of data corresponding to a subdirectory. These records also consist only of a header. A START_OF_DIRECTORY record has a depth that is one greater than that of the DIRECTORY_ENTRY record that immediately precedes it. The corresponding subdirectory data comprises the consecutive sequence of records following the START_OF_DIRECTORY record which have a depth greater than or equal to that of the

START_OF_DIRECTORY record. This sequence of records is required to be terminated by an END_OF_DIRECTORY record with the same depth, so that START_OF_DIRECTORY and END_OF_DIRECTORY records of each depth always occur in matched pairs, like parentheses, with the records giving the content of that subdirectory sandwiched in between.

A DIRECTORY_ENTRY record specifies the name of an entry within the current directory, along with metadata associated with that entry. There are DIRECTORY_ENTRY records both for regular files and for subdirectories. The metadata, which has a fixed length (see below), occurs immediately following the header. Following the metadata is a file name, whose length may vary. File names consist simply of a sequence of arbitrary bytes (with no null terminator). The length of the name can be determined by subtracting from the total size of the record the size of the header and the size of the metadata.

A FILE_DATA record specifies the content of the file whose name is given by the immediately preceding DIRECTORY_ENTRY record. The content consists of the sequence of bytes of data immediately following the FILE_DATA header. The length of this sequence of bytes can be determined by subtracting the size of the record header from the total size of the record.

The metadata contained in a DIRECTORY_ENTRY record consists of the following 12 bytes of data:

4 bytes of type/permission information (type mode_t , in big-endian format), as specifed for the st_mode field of the struct stat structure in the man page for the stat() system call.

8 bytes of size information (type off_t , in big-endian format), as specified for the st_size field of the struct stat structure.

Note that the header file transplant.h contains various definitions associated with the serialized data format described above. You should use the definitions from this header file where appropriate.

Part 3: Required Functions and Variables
The header file const.h contains specifications of functions that you must implement, as well as the declarations of several variables and arrays that you must use for their stated purposes. You must implement these functions because we will want to be able to test them









 








◌◍◎●◐ The symbol tells the shell to perform "output redirection": the file outfile is created (or truncated if it already existed -- be careful!) and the output produced by the program on stdout is sent to that file instead of to the terminal.

◌◍◎●◐ $? is an environment variable in bash which holds the return code of the previous program run. In the above, the echo command is used to display the value of this variable.







In the following example:



serialized data is redirected from the file outfile , becoming the stdin seen by the program.

For debugging purposes, the contents of outfile can be viewed using the od ("octal dump") command:



This is the serialized output from the rsrc/testdir example discussed above (note that the rsrc/testdir test directory is included with the base code). The successive bytes at the beginning of the file have the hexadecimal values 0c 0d ed 00 00 00 ... . You can readily identify this as the start of a record, due to the presence of the "magic sequence" 0c 0d ed . The values in the first column indicate the offsets (in bytes) from the beginning of the file, specified as 7-digit octal (base 8) numbers.



◎◍◌◐● The -t x1 flag instructs od to interpret the file as a sequence of individual bytes (that is the meaning of the " 1 " in " x1 "), which are printed as hexadecimal values (that is the meaning of the " x " in " x1 "). The od program has many options for setting the output format; another useful version is od -bc , which shows individual bytes of data as both ASCII characters and their octal codes.

Refer to the "man" page for od for other possiblities.



If you use the above command with an outfile that is much longer, there would be so much output that the first few lines would be lost off of the top of the screen. To avoid this, you can pipe the output to a program called less :



This will display only the first screenful of the output and give you the ability to scan forward and backward to see different parts of the output.

Type h at the less prompt to get help information on what you can do with it. Type q at the prompt to exit less .

Alternatively, the output of the program can be redirected via a pipe to the od command, without using any output file:
 




















Testing Your Program

 

Your program can be used with pipes to form a one-line command that will "transplant" a tree of files and directories from one place to another, by serializing files from a source directory and deserializing them into a different target directory:



If the program is working properly, the files and directories under mydir will be copied to otherdir . To verify that the content of mydir and otherdir is now identical, you can use the diff command (use man diff to read the manual page):



For text files, diff will report the differences between the files in an understandable form, but for binary files, diff will just report that the files differ. To actually see the differences between binary files, you can use the cmp command to perform a byte-by-byte comparison of two files, regardless of their content:



If the files have identical content, cmp exits silently. If one file is shorter than the other, but the content is otherwise identical, cmp will report that it has reached EOF on the shorter file. Finally, if the files disagree at some point, cmp will report the offset of the first byte at which the files disagree. If the -l flag is given, cmp will report all disagreements between the two files.

We can take this a step further and run an entire test without using any files:



●◐◌◍◎ <(...) is known as process substitution. It is allows the output of the program(s) inside the parentheses to appear as a file for the outer program.

◌◍◎●◐ cat is a command that outputs a file to stdout .



If the content of the reference file was identical to the output of transplant , cmp outputs nothing (as in this example).

Unit Testing

 

Unit testing is a part of the development process in which small testable sections of a program (units) are tested individually to ensure that they are all functioning properly. This is a very common practice in industry and is often a requested skill by companies hiring graduates.

 

◍◎●◐◌ Some developers consider testing to be so important that they use a work flow called test driven development. In TDD, requirements are turned into failing unit tests. The goal is then to write code to make these tests pass.



This semester, we will be using a C unit testing framework called Criterion, which will give you some exposure to unit testing. We have provided a basic set of test cases for this assignment.

The provided tests are in the tests/hw1_tests.c file. These tests do the following:

 validargs_help_test ensures that validargs sets the help bit correctly when the -h flag is passed in.

More products