$30
In this programming assignment, you are asked to implement a program in either Java or Python 3. You should code a program that downloads and processes data that is obtained from an HTTP Server. The goal of the assignment is to make you familiar with the HTTP Protocol and TCP socket programming. You must implement your program using either the Java Socket API of the JDK or the Socket package in your default Python distribution. If you have any doubt about what to use or not to use, please contact your teaching assistant Ayhan Okuyan at ayhan.okuyan[at]bilkent.edu.tr. For Python, the use of any class/function from http or the requests package is prohibited. When preparing your project please keep in mind that your projects will be first evaluated by a computer program. Any problems in the formatting can create problems in the grading. Errors caused by incorrectly naming the project files and folder structure will cause you to lose points. 2) Specifications The server program that is provided to you (in Python 3) uses a HTTP/1.1 similar application layer protocol that uses TCP underneath to communicate with the client. In server side, there are three entities that you will separately download and process. For easy understanding, this project is divided into three parts, all contributing to each other. It is suggested that you start from the first part and continue sequentially. In order to work with these problems, first you will be running the server program. In order to run the program use the following command in the homework folder via terminal. Note that one should download the appropriate Python 3 software. This server code is written in Python 3.7.7. python3 serv.py This program is written such that it opens a socket and creates a listening port on your local machine on port 8000 (http://localhost:8000/). If you receive a binding error while running this program, port 8000 may be used by another process, so consider changing it. A. First HTTP Connection and TCP Socket Programming This first part of the project asks you to create a TCP Socket as an HTTP client and connect it to the socket that the server program is listening. Then retrieve the webpage data by sending a simple HTTP GET request, issued as follows: GET HTTP/1.1\r\n Host: \r\n\r\n Where is the filename that you are required to enter with a “/“ in the beginning to indicate that and is the domain URL that you are trying to reach e.g. localhost:8000/. When a webpage entity is requested for the first time, its 'index.html’ page is requested, or you can simply leave the as ‘/‘ to request this page. Once the content is retrieved, store it as an HTML file with the name 'index2.html'. Then you need to extract the information of the entity that you will try to obtain next, hidden in the HTML code. Parsing and obtaining the entity name is part of the assignment and your grade will be deducted in the absence of it. In the report, provide a brief explanation as to how the GET requests and corresponding responses operate. B. HTTP Authentication In the second part, you are asked to reach and download the content whose name you have covered in the previous part. However, this part of the site is protected with a Basic Authorization scheme. For further information, examine the HTTP Authentication page. First, try to access the page using the method you have followed in the previous part. Then use a GET request with a Basic Authorization header. You are provided the username and password, ‘bilkentstu' and ‘cs421s2021' respectively. Use these credentials in the form : and obtain a base64 encoding as the authentication key. You may face the following response codes from the server program. In your report answer these questions. • Why did the first attempt resulted with an error code? Why do we use authorization? • Why do we use an encoding when sending the authorization request? Save the HTML file obtained from this page and save it as ‘protected2.html’. Then, repeat the previous part and extract the name of the entity that will be processed in the following part. Parsing and obtaining the entity name is part of the assignment and your grade will be deducted in the absence of it. C. HTTP Range Requests When downloading or streaming particularly large entities from the web e.g videos, It may not be feasible to do it with a single GET request. For these types of occasions, Range Requests are used to retrieve parts of the data. In order to be able to do that, you will send a HEAD request to retrieve information on the length of the data with the following format. HEAD HTTP/1.1\r\n Host: \r\n\r\n STATUS CODE NAME FUNCTION 200 OK The request has succeeded. 401 Unauthorized The client must authenticate itself to get the requested response. 403 Forbidden The client does not have access rights to the content; that is, it is unauthorized, so the server is refusing to give the requested resource. 404 Not Found The server cannot find the requested page. Use this request to get the information on the text entity whose name is obtained in the previous part and the ‘index.html’ file. In your report, explain what the received headers do and compare the two. After covering the length, you should write a code to download the text file in ranges [10,100,1000,10000,15000] bytes and save the total download time for each case. In your report, provide the execution times and the plot of execution time vs. range. Comment on the reason why the graph has the shape that you obtained. Discuss on what would happen if you try to download the text file with a single GET request (not using range requests). Also give a brief description of the challenges you have faced during the implementation process. Since HTTP is a stateless protocol, client should be aware and track the arrival of the last part of the file with the information derived from the incoming headers. Note that the server program is adopted for single client use and inherent persistent connection. This means after connecting the client, you won’t be able to connect with another client and also you can use the open connection to send and retrieve data more than once, without the need of another client. Also note that, HTTP is a synchronous protocol which waits for the response from the server before sending another request (excluding HTTP2.0). Save all of the obtained files with the format 'big.txt' where is the used range value for that execution. The responses you may observe in this part are as follows. The persistent connection is established with the use of ‘Connection’ header in HTTP1.1. The connection stays active when connection is ‘keep-alive’ until the status is changed to ‘close’. However, the connection between the server is implemented as a STATUS CODE NAME FUNCTION 206 Partial Content This response code is used when the Range header is sent from the client to request only part of a resource. 416 Requested Range is not Satisfiable The requested byte range is not available and is out of bounds. 404 Not Found The server can not find the requested page. persistent one inherently in this assignment and in order to close the connection with the server, it is necessary to send an EXIT request which is defined as follows. EXIT HTTP/1.1\r\n\ Host: \r\n\r\n 3) Running your Program Your program must be a console application (no graphical user interface, GUI, is allowed) and should be named as httpclient.py or httpclient.java based on your preference of language. Your program should not take any argument from the command line. You are free and encouraged to place print statements in your code to describe the functionality