Starting from:

$35

CS421- Programming Assignment 1 Solved

In this programming assignment, you are asked to implement a program in either Java or Python. This program is supposed to download an index file to obtain a list of text file URLs and download some of these files depending on their sizes. For this programming assignment, you are not allowed to use any third party HTTP client libraries, or the HTTP specific core or non-core APIs. The goal of this assignment is to make you familiar with the internals of the HTTP protocol and using any (third party, core or non-core) API providing any level of abstraction specific to the HTTP protocol is not allowed. You must implement your program using either the Java Socket API of the JDK or the Socket package in your default Python distribution. If you have any doubts about what to use or not to use, please contact your teaching assistant Hamza Pehlivan at hamza.pehlivan[at]bilkent.edu.tr. For Python, the use of any class/function from http or the requests package is prohibited.  

Your program must be a console application (no graphical user interface (GUI) is required) and should be named as FileDownloader (i.e., the name of the class that includes the main method should be FileDownloader). Your program should run with the

[java/python] FileDownloader[.java/.py] <index_file> 

[<lower_endpoint>-<upper_endpoint>]

command, where <index file> and <lowerendpoint>-<upperendpoint> are the command-line arguments. The details of the command-line arguments are as follows:

•      <index file>: [Required] The URL of the index that includes a list of text file URLs.

•      <lower endpoint>-<upper endpoint>: [Optional] If this argument is not given, a file in the index is downloaded if it is found in the index. Otherwise, the bytes between <lowerendpoint> and <upper endpoint> inclusively are to be downloaded.

When a user enters the command above, your program will send an HTTP GET request to the server in order to download the index file with URL <index file>. If the index file is not found, the response is a message other than 200 OK. In this case, your program will print an error message to the command-line and exits. If the index file is found, the response is a 200 OK message. When this is the case, your program will print the number of file URLs in the index file and send an HTTP HEAD request for each file URL in the index file.

Requested file is not found: If the requested file is not found in the server, the response is a message other than 200 OK. In this case, your program will print a message to the command-line indicating that the file is not found. Then, an HTTP HEAD request is sent for the next file.

Requested file is found: If the requested file is found in the server, the response is a 200 OK message which includes the size of the file in bytes in the header. When this is the case, there are three possibilities:

1.      If the user does not give a range as a command-line argument, your program should send an HTTP GET message to obtain the content of the whole file.

2.      If a range is given as a command-line argument and the size of the file is smaller than <lowerendpoint>, the file will not be downloaded and your program will print a message to the command-line indicating that the file is not requested. Then, an HTTP HEAD request is sent for the next file.

3.      If a range is given as a command-line argument and the size of the file is not smaller than <lower endpoint>, the range is satisfiable. Then, your program should send an HTTP GET message with the range <lower endpoint>-<upper  endpoint> and obtain a part of the file content from the HTTP 206 Partial Content response.

4.      If your program successfully obtains the file or a part of the file, it saves the content under the directory in which your program runs. The name of the saved file should be the same as the downloaded file and a message indicating that the file is successfully downloaded is printed to the command-line. Then an HTTP HEAD request is sent for the next file.

In the report, provide a brief explanation as to how the GET requests and corresponding responses operate.

 

Assumptions and hints
•      Please refer to W3Cs RFC 2616 for details of the HTTP messages in general and RFC 7233 for details of range requests.

•      You will assume that <lowerendpoint> and <upper endpoint> are both non-negative integers and <lower endpoint> is not greater than <upper  endpoint>. Note that there should be a hyphen ‘-’ character between the endpoints.

•      You will assume that each line of the index file includes one file URL.

•      You will assume that the name of each file in the index is unique.

•      Your program will not save the index file to the local folder.

•      Your program should print a message to the command-line to inform the user about the status of the files.

•      The downloaded file should be saved under the directory containing the source file FileDownloader and the name of the file should be the same as the name of the downloaded file.

•      You may use the following URLs to test your program:

www.cs.bilkent.edu.tr/~cs421/fall21/project1/index1.txt www.cs.bilkent.edu.tr/~cs421/fall21/project1/index2.txt

•      Please contact your assistant if you have any doubt about the assignment.

Example
Let www.foo.com/abc/index.txt be the URL of the file to be downloaded whose content is given as

www.cs.bilkent.edu.tr/file.txt  www.cs.bilkent.edu.tr/folder2/temp.txt  wordpress.org/plugins/about/readme.txt  humanstxt.org/humans.txt www.cs.bilkent.edu.tr/cs421/deneme.txt
where the first file does not exist in the server and the sizes of the other files are 6000, 4567, 1587, and 9000 bytes, respectively.

Example run 1. Let your program start with the FileDownloader www.foo.com/abc/index.txt

command. Then all files except the first one in the index file are downloaded. After the

connection is terminated, the command-line of the client may be as follows:

 

Command-line:

URL of the index file: www.foo.com/abc/index.txt

No range is given

Index file is downloaded

There are 5 files in the index

1.  www.cs.bilkent.edu.tr/file.txt is not found

2.  www.cs.bilkent.edu.tr/folder2/temp.txt (size = 6000) is downloaded

3.  wordpress.org/plugins/about/readme.txt (size = 4567) is downloaded

4.  humanstxt.org/humans.txt (size = 1587) is downloaded

5.  www.cs.bilkent.edu.tr /cs421/deneme.txt (size = 9000) is downloaded
 

Example run 2. Let your program start with the FileDownloader www.foo.com/abc/index.txt 0-999

command. Then the first 1000 bytes of all files except the first one in the index file www.foo.com/abc/index.txt are downloaded. After the connection is terminated, the command-line of the client may be as follows:

 

Command-line:

URL of the index file: www.foo.com/abc/index.txt

Lower endpoint = 0

Upper endpoint = 999

Index file is downloaded

There are 5 files in the index

1.  www.cs.bilkent.edu.tr/file.txt is not found

2.  www.cs.bilkent.edu.tr/folder2/temp.txt (range = 0-999) is downloaded

3.  wordpress.org/plugins/about/readme.txt (range = 0-999) is downloaded

4.  humanstxt.org/humans.txt (range = 0-999) is downloaded

5.  www.cs.bilkent.edu.tr /cs421/deneme.txt (range = 0-999) is downloaded
 

Example run 3. Let your program start with the FileDownloader www.foo.com/abc/index.txt 1587-6999

command. The last byte of the fourth file is 1586, so it is not requested. Hence, the bytes in the range 1587-6999 are requested for the second, third, and fifth files. Since the second and third files do not include 7000 bytes, the response includes the bytes in the ranges 1587-5999 and 1587-4566, respectively. After the connection is terminated, the command-line of the client may be as follows:

 

Command-line:

URL of the index file: www.foo.com/abc/index.txt

Lower endpoint = 1587

Upper endpoint = 6999

Index file is downloaded

There are 5 files in the index

1.  www.cs.bilkent.edu.tr/file.txt is not found

2.  www.cs.bilkent.edu.tr/folder2/temp.txt (range = 1587-5999) is downloaded

3.  wordpress.org/plugins/about/readme.txt (range = 1587-4566) is downloaded

4.  humanstxt.org/humans.txt (size = 1587) is not downloaded

5.  www.cs.bilkent.edu.tr/ cs421/deneme.txt (range = 1587-6999) is downloaded
 

 

More products