$50
Accessing the sunfire Server
To test your programs, please use your SoC UNIX ID and password to log on to sunfire first:
• If you don’t have your SoC UNIX account, please create it here: https://mysoc.nus.edu.sg/~newacct
• If you forget your SoC password, please reset it here using your NUSNET ID and password: https://mysoc.nus.edu.sg/~myacct/resetpass.cgi
• If you are using a UNIX-like system (e.g. Linux, Mac, or Cygwin on Windows), SSH should be available from the command line and you may simply type:
• After logging on to sunfire, you can run a Python 3 script with the following command:
/usr/local/Python-3.7/bin/python3 <path-to-script> For convenience, you can set a shortcut to python3 as follows: alias python3=/usr/local/Python-3.7/bin/python3
This shortcut is temporary, and it lasts for the current SSH session only. To make it permanent, you can run the following command once to store the shortcut into the configuration file: echo alias python3=/usr/local/Python-3.7/bin/python3 >> ~/.bash_profile With the shortcut, you can then run a Python 3 script simply with python3 <path-to-script> • If you are using a Windows machine, you may need to install an SSH client (e.g. “SSH Secure Shell Client”, available in LumiNUS Files -> “SSH Secure Shell” folder) if your machine does not have one installed. You may use the “SSH Secure File Transfer Client” software (bundled with “SSH Secure Shell Client”) to upload your programs to sunfire for testing.
Exercise 2 – PacketExtr (1 mark)
In this exercise, you are going to write a program, PacketExtr.py, to read consecutive “packets” from the stdin stream, extract their data payloads in a responsive manner and output to stdout.
We define a custom format of packets for this exercise. A packet consists of a textbased header and a binary data payload following the header. The header is a string formatted as “Size:␣<size>B”, where ␣ represents a white-space character, and <size> is a decimal integer representing the number of bytes of the binary data following the header. The header is ended with the “B” character, and the payload immediately follows the header without any byte (such as “\n”) in between. For example, “Size:␣2105B” is a complete and valid header.
The binary payload can contain any byte, not limited to printable characters. Therefore, the payload should not be treated as string data in your program. This packet format clearly defines the boundary between header and binary data within a packet, and also the boundary between consecutive packets. It ensures that all information can be parsed correctly over a data stream.
Packets are fed to your program sequentially through stdin, until End-Of-File (EOF) is encountered. It is guaranteed that all packets have correct formats and correct payload sizes. Your program should be responsive to the input in the sense that upon receiving a full packet, it outputs the payload of the packet to stdout without any extra characters including newline. Unlike Exercise 1, the program cannot read all data once and process them in batch.
It is recommended that your program reads and writes data in binary mode instead of the default text mode. For interactive I/O, read1() should be used instead of the ordinary read(), and flush() should be called after each write() to avoid delays caused by buffering. Both read() and read1() accept one argument which is the maximum number of bytes to read. The main difference is that, read() returns only when the specified number of bytes are read or End Of File (EOF) is encountered, while read1() returns immediately upon new data stream in and may return fewer bytes than specified. You can learn more about the details at https://docs.python.org/
3/library/io.html#io.BufferedReader and https://docs.python.org/3/library/ io.html#io.BufferedWriter. The following code shows how to do binary I/O on stdin and stdout:
import sys
# read **at most** 5 bytes from stdin data = sys.stdin.buffer.read1(5)
# write data to stdout and flush immediately sys.stdout.buffer.write(data) sys.stdout.buffer.flush()
Here, the data object is of bytes class. Python 3 programs can operate on binary data using bytes objects. This class is very similar to str, the string class. For example, both classes have functions find(), split(), and also slice operators for range access (e.g. a[0:10]). Details about this class can be found at https://docs.python.org/3/ library/stdtypes.html#bytes-and-bytearray-operations.
The following shows how to manipulate bytes objects:
# prepend b to the ’...’ expression to form a bytes object
# instead of str pos = data.find(b’x’) if pos >= 0:
# if byte ’x’ is found in data
part1 = data[0:pos+1] # this is similar to str slicing part2 = data[pos+1:] # slice until the end of data
To convert a bytes object to str, your program can call the decode() method of bytes. In this exercise, there is no need to worry about text encoding, as we only use basic ASCII characters in headers.
Finally, to detect EOF on stdin, your program can check the length of the bytes object read from stdin. If your program expects to read more data from stdin but receives a zero-length bytes object, it means that there is no more data on stdin and EOF is encountered.
When testing your program on command line, you can feed the contents of a file to stdin of your program using file redirection (<), instead of typing into the terminal. You can also use another type of redirection (>) to save the output of your program to a file rather than let it print to the terminal. For example, the following line feeds the file input.data to the program and save its output to output.data: python3 PacketExtr.py < input.data > output.data
You can then compare binary contents of output.data and the given reference output by running: cmp output.data ref-output.data
By default, cmp outputs all differences found between the two files. Hence, no output means that the two files are identical.
Note that above command line testing can only test your program’s correctness. During grading, in addition to testing correctness, we will test responsiveness by setting a one-second timeout after feeding your program a full packet. That is, if your program does not output the packet payload on time, we will deem your program as unresponsive. This timeout is sufficient for our test data sizes. We reiterate that your program should not do batch processing, as interactive processing is one of the key points of this exercise. We limit the size of each packet to 1MB and the size of all packets to 10MB.
Sample run:
$ python3 PacketExtr.py < test/packets-a.in > run-a.out
$ cmp test/packets-a.out run-a.out