Starting from:

$25

ENCE360-Assignment 1 Http Client, Concurrent Queue and File Merging & Removal Solved  

Part 1 – http client  
Write an implementation of an http client in http.c which performs an HTTP 1.0 query to a website and returns the response string (including header and page content). Note that a server may not respect the range size you specify, so be wary of the webpages you test against. A good one to use is imgur.com.  

Make sure to test your implementation against a selection of binary files, and text files – small files and big. Be very careful using string manipulation functions, e.g. strcpy – they will not copy any binary data containing a '\0' – so you will want to use the binary counterparts such as memcpy.

For downloading bigger files you will need to think about how to allocate memory as you go – functions such as realloc allow you to dynamically resize a buffer without copying the contents each time.

Functions which you will need to implement this include: memset, getaddrinfo, socket, connect, realloc, memcpy, perror and free

Also, possibly useful: fdopen dup

Your lecture notes may be useful, and this is a good reference which you may find useful: https://beej.us/guide/bgnet/html/multi/syscalls.html  

 

Sample HTTP GET request
To retreive the file www.canterbury.ac.nz/postgrad/documents/cribphdmay.doc 

"GET /page HTTP/1.0\r\n

Host: host\r\n

Range: bytes=0-500\r\n 

User-Agent: getter\r\n\r\n"

Where host = "www.canterbury.ac.nz" and page = "postgrad/documents/cribphdmay.doc". The Range portion is optional, but when specified allows you to retrieve part of a file. See this MDN page for more information.

 

Example output
Test programs http_test, and http_download is provided for you to test your code, an example output of the http_test is shown below. It is implemented in test/http_test.c and is built by the Makefile by default.  ./http_test www.thomas-bayer.com sqlrest/CUSTOMER/3/  Header:

HTTP/1.1 200 OK

Server: ApacheCoyote/1.1

ContentType: application/xml

Date: Tue, 02 Sep 2014 04:47:16 GMT

Connection: close

ContentLength: 235

 

Content:

<?xml version="1.0"?<CUSTOMER  

xmlns:xlink="http://www.w3.org/1999/xlink"

    <ID3</ID

    <FIRSTNAMEMichael</FIRSTNAME

    <LASTNAMEClancy</LASTNAME

    <STREET542 Upland Pl.</STREET

    <CITYSan Francisco</CITY

</CUSTOMER 

 

http_download does a similar thing but instead writes the downloaded file to disk using a filename you give  it. A script is provided to test your implementation against those of well known file downloader wget  

./test_download.sh  www.cosc.canterbury.ac.nz/research/reports/PhdTheses/2015/phd_1501 

 

downloaded 13025535 bytes from  

www.cosc.canterbury.ac.nz/research/reports/PhdTheses/2015/phd_1501 .pdf

Files __template and __output are identical

./test_download.sh  

static.canterbury.ac.nz/web/graphics/blacklogo.png 

downloaded 9550 bytes from  

static.canterbury.ac.nz/web/graphics/blacklogo.png Files 

__template and __output are identical  

Part 2 – concurrent queue  
Write a classic implementation of a concurrent FIFO queue in queue.c which allows multiple producers and consumers to communicate in a thread-safe way.  Before studying the requirements, it is advised to study the test program queue_test for an example of how such a queue is intended to be used.  

Hints: Use semaphores (sem_init, sem_wait, sem_post etc.)  and mutex(s) for thread synchronization. Use a minimum number of synchronization primitives while still maintaining correctness and maximum performance.

 

 

Testing
A test program queue_test is provided for you to test your code and illustrate how to use the concurrent queue.  Note that it is not a completely comprehensive test – and the test used for marking will be much more stringent on correctness, it may be possible to run the queue_test program yet receive low marks. So you may wish to write your own tests and/or test with the provided downloader program.  

 

./queue_test  total sum: 1783293664, expected sum: 1783293664

(This should complete within two seconds on the lab machines)

 

Part 3 – chunk sizes & partial-content downloads  
In http.c, write an implementation to determine the maximum byte size to request in a partial download.

You should use this to determine the number of partial-content downloads your program needs to execute. A server may not respect the range size. It is recommended to use a HEAD request here since only the headers are desired. The request can be performed similar to a GET request:

"HEAD /page HTTP/1.0\r\n

Host: host\r\n

User-Agent: getter\r\n\r\n"

Where host = ”i.imgur.com” and page =  “VuKnN5P.jpg”

 

 

Part 4 – File Merging & Removal  
In downloader.c, two function prototypes have been defined for you. merge_files and remove_chunk_files. Implement merge_files first, and check that your output is as expected. When your file merging is working correctly, explore removing the partial-download files.  

NOTE: Feel free to discard either of these functions and use a more optimal approach (explain this in your report).

 

 

Part 5 – report
Algorithm analysis
Describe the algorithm found in downloader.c (lines 228-264) step by step.  

How is this similar to algorithms found in your notes, which is it most like – and are there any improvements which could be made?  

 

Performance analysis 
Provide a small analysis of performance of the downloader program showing how performance changes as the number of worker threads increases. What is the optimal number of worker threads? What are some of the issues when the number of threads is increased?

Run the downloader several times over different inputs (I have provided some urls e.g. test.txt and large.txt) with small files, and large files to download.

Usage (downloading files in test.txt with 12 threads and saving to directory 'download'):

./downloader test.txt 12 download

How does the file size impact on performance? Are there any parts of your http downloader which can be modified to download files faster?

A pre-compiled version of the downloader exists in 'bin/downloader' how does your implementation compare? If there's a large discrepancy when the number of threads increase, it's likely there's something wrong with your concurrent queue!

 

Analyse the approach used for file merging and removal. Are there any improvements that can be made? Why? Is the current method of partially downloading every file optimal? If not, suggest a way to improve performance. 

More products