$30
CSc 361: Computer Commnications and Networks Programming Assignment 1: Smart Web Client
6 1 Goal
7 The project is to build a tool at web client to collect information regarding a web server. The 8 purpose of this project is two fold:
9 • to provide students with hands-on experience with socket programming in Python,
10 • to help students understand the application-layer protocols HTTP/HTTPs. Note that HTTPs 11 is not a standalone protocol, but instead it is HTTP over Transport Layer Security (TLS). In 12 this assignment, your main focus is HTTP, not TLS.
13 2 Background
14 2.1 HTTP
15 HTTP stands for Hyper Text Transfer Protocol and is used for communication among web servers.
16 The web client initiates a conversation by opening a connection to a web server. Once a connec17 tion is set up, the client sends up an HTTP request. The server sends an HTTP response back to 18 the client. An HTTP request consists of two parts: a header and a body. Whether a body follows 19 a header or not is specified in the header.
20 Using single-line header of HTTP request as an example, the first line of any request header 21 should be:
22 • the method field: The method field can take on several different values, including GET, POST,
23 HEAD, and so on.
24 • the URL field: It is the field to identify a network resource, e.g., “http://www.csc.uvic.ca/index.html”.
25 • the HTTP version field
26 The response from a server also has two parts: a header and a body. The first line of a header 27 should be:
28 • the HTTP version field,
29 • the status code field,
30 • the phrase field.
31 Two main status codes include 200 and 404. The status code 200 means that the request 32 succeeded and the information is returned in the response. The status code 404 means that the 33 requested document does not exist on this server. Two example response messages are: “HTTP/1.0 34 404 Not Found\r\n\r\n” and “HTTP/1.0 200 OK\r\n\r\n data data data ...” Another two status 35 codes 505: “HTTP Version Not Supported”, and 302: “302 found” for URL redirection are also 36 useful for this assignment.
37 2.2 URI
38 URI stands for Uniform Resource Identifier and is also known as the combination of Uniform 39 Resource Locators (URL) and Uniform Resource Names (URN). It is a formatted string which 40 identifies a network resource. It generally has the format: protocol://host[:port]/filepath. When a 41 port is not specified, the default HTTP port number is 80, and the default HTTPS port number is 42 443.
43 2.3 Cookies
44 An HTTP cookie is a small piece of data that a server sends to the user’s web browser. The browser 45 may store it and send it back with the next request to the same server. Typically, it’s used to tell 46 if two requests came from the same browser keeping a user logged-in, for example. It remembers 47 stateful information for the stateless HTTP protocol. Cookies have many applications in web, such 48 as tracking, authentication, and web analytics. Due to this reason, cookies also cause many concerns 49 on security and privacy breach.
50 The textbook includes simple introduction on cookies. More detailed information could be 51 found at: https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies. Python includes dedi52 cated modules to handle Cookies: https://docs.python.org/3/library/http.cookies.html. Neverthe53 less, you are no allowed to use this package because it defeats the purpose of this assignment:
54 understanding the nuts and bolts of HTTP.
55 3 Project Description
56 You are required to build a smart web client tool, called SmartClient, in Python. Note that for 57 consistence, program in other language will not be accepted!
58 Given the URL of a web server, your SmartClient needs to find out the following information 59 regarding the web server:
60 • 1. whether or not the web server supports HTTPs,
61 • 2. whether or not the web server supports http1.1
62 • 3. whether or not the web server supports http2,
63 • 4. the cookie name, the expire time (if any), and the domain name (in any) of cookies that 64 the web server will use.
65 Your program first accepts URI from stdin and parses it. Then it connects to a server, sends an 66 HTTP request, and receives an HTTP response. You should also implement a routine that prints 67 out the response from the server, marking the header and the body. When you finish the client, 68 you can try to connect to any HTTP server. For instance, type “www.uvic.ca” as the input to the 69 client program and see what response you get.
70 As an example output, after you run your code with
71 % python SmartClient.py www.uvic.ca
72 Your SmartClient may output the received response from the server (optional), e.g.,
73 ---Request begin---
74 GET http://www.uvic.ca/index.html HTTP/1.1
75 Host: www.uvic.ca
76 Connection: Keep-Alive
77
78 ---Request end---
79 HTTP request sent, awaiting response...
80
81 ---Response header ---
82 HTTP/1.1 200 OK
83 Date: Tue, 02 Jan 2018 22:42:27 GMT
84 Expires: Thu, 19 Nov 1981 08:52:00 GMT
85 Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
86 Pragma: no-cache
87 Set-Cookie: SESSID_UV_128004=VD3vOJhqL3YUbmaZSTJre1; path=/; domain=www.uvic.ca
88 Set-Cookie: uvic_bar=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; dom
89 Keep-Alive: timeout=5, max=100
90 Connection: close
91 Content-Type: text/html; charset=UTF-8
92 Set-Cookie: www_def=2548525198.20480.0000; path=/
93 Set-Cookie: TS01a564a5=0183e07534a2511a2dcd274bee873845d67a2c07b7074587c948f80a42c427b1f7ea 94 Set-Cookie: TS01c8da3c=0183e075346a73ab4544c7b9ba9d7fa022c07af441fc6214c4960d6a9d0db2896; p 95 Set-Cookie: TS014bf86f=0183e075347c174a4754aeb42d669781e0fafb1f43d3eb2783b1354159a9ad8d81f7
96
97 --- Response body ---
98 Body Body .... (the actual content)
99
100 Note that some lines in above output were truncated.
101 Your code might need to send multiple requests in order to find out the required information. 102 Your code should output the final results (mandatory), for example:
103 website: www.uvic.ca
104 1. Supports of HTTPS: yes
105 2. Supports http1.1: yes 106 3. Supports http2: no 107 4. List of Cookies:
108 cookie name: SESSID_UV_128004, domain name: www.uvic.ca
109 cookie name: uvic_bar, expires time: Thu, 01-Jan-1970 00:00:01 GMT; domain name: .uvic.ca
110 cookie name: www_def,
111 cookie name: TS01a564a5
112 cookie name: TS01c8da3c, domain name: www.uvic.ca
113 cookie name: TS014bf86f, domain name: .uvic.ca
114 3.1 Other Notes
115 1. Regarding other printout: Anything not specified in Assignment 1 is optional. For example, 116 you can decide whether or not to print out the IP address, port number, and so on. When 117 TAs test your code, if your code works fine without any problem, you are fine even if you 118 do not print out anything not required in Assignment 1. Nevertheless, if your code does not 119 work, TAs will not spend time to figure out what is wrong and you get a zero mark on the 120 required function (Refer to the table in Section 5 of Assignment 1). In this case, if your code 121 includes some printout to show intermediate results, TAs will have an idea on how far you 122 have achieved and give you some partial mark based on their own judgement.
123 2. Regarding readme file. Readme file is important. Without it TAs will not know how to 124 compile your code and how to run your code. It would waste our time to deal with your 125 complaint if TAs cannot run your code and give you a zero.
126 3. For more information on HTTP, HTML, URI, etc., please refer to http://www.w3.org. It is 127 the home page of W3 Consortium and you will find many useful links to subjects related to 128 the World Wide Web.
129 4 Schedule
130 In order to help you finish this programming assignment successfully, the schedule of this assignment
131 has been synchronized with both the lectures and the tutorials/ labs. Before the final deadline, there 132 are three tutorial sessions to help you finish the assignment. A schedule is listed as follows:
Session
Tutorial
Milestones
Tut 1
P1 spec go-through, design hints, python
design and code skeleton
Tut 2
socket programming and testing
alpha code done
Tut 3
socket programming and last-minute help
beta code done
133 5 Deliveries and Marking Scheme
134 For your final submission of each assignment you are required to submit your source code to 135 brightSpace in a single zip file (double check your zip file to make sure all required files have 136 been included before submission!). You should include a readme file to tell TA how to compile and
137 run your code.
138 Note: For consistency and ease of test, you should test/run your code on the server linux.csc.uvic.ca
139 by running python3 and the python packages supported by the server linux.csc.uvic.ca. In other
140 words, TAs will test your code on linux.csc.uvic.ca and give marks based on the test results over 141 linux.csc.uvic.ca rather than the results from your local computer.
142 The marking scheme is as follows:
Components
Weight
Error handling
10
Correct output for “support of HTTPS”
15
Correct output for “support of http1.1 ”
15
Correct output for “support of http2 ”
20
List of Cookies
30
Code style
5
Readme.txt
5
143
Total Weight
100
144 Important Note: listing cookies is a very tricky business, and it is possible that you will not 145 get a unique, static answer due to the dynamic changes in cookies, some created dynamically based 146 on users’ interactive input. Some online tool, such as http://www.cookie-checker.com/, can find 147 cookies that are triggered by javascript or php code. Nevertheless, finding those cookies is optional 148 for this Assignment. You will get 10% bonus if you implement this part.
149 6 Plagiarism
150 This assignment is to be done individually. You are encouraged to discuss the design of your solution 151 with your classmates, but each person must implement their own assignment.
152