$35
CPE-321-02
Introduction of Computer Security
Assignment 3: Cryptographic Hash Functions
Objectives
You will explore the properties of cryptographic hash functions and find collisions on a truncated range. You will also crack a password file.
The objectives for this lab assignment are as follows:
● To explore Pseudo-randomness and Collision Resistance.
● To break real hashes (To crack a password file).
Programming Tools: Strong recommendation to use Python, Pycrypto and Bcrypt (or some other high-level language with decent crypto support).
Warning: Be aware that this lab can take considerable time for each task. Based on previous lab experience, the best estimate for task 2 can take 17-18 hours of processing time. Therefore, please do not wait to get this lab started.
Tasks
Please read the following tasks carefully, and finish the lab assignment accordingly:
1. Task 1: Exploring Pseudo-Randomness and Collision Resistance. In this task, you will investigate the pseudorandom and collision resistant properties of cryptographic hash functions.
a. Write a program that uses SHA256 to hash arbitrary inputs and print the resulting digests to the screen in hexadecimal format.
b. Hash two strings (of any length) whose Hamming distance is exactly 1 bit (i.e. differ in only 1 bit). Repeat this a few times.
c. Next we aim to find two strings that creates the same digest (called a collision). Because SHA256 is a secure cryptographic hash function (as far as we know), it is infeasible to use its full 256-bit output. Instead, you will limit its domain to between 8 and 50 bits. Modify your program to compute SHA256 hashes of arbitrary inputs, so that it is able to truncate the digests to between 8 and 50 bits (it doesn’t matter which bits of the output you choose, as long as you are consistent). Once you have completed the above, try to find a collision in your truncated hash domains (i.e. two different inputs, that create the same, truncated digest). There are at least two different ways of doing this (You need only do one):
• Find a target hash collision (weak collision resistance): Give 𝑚0, find 𝑚1 such that 𝐻(𝑚0) = 𝐻(𝑚1) and 𝑚0 ≠ 𝑚1. This is not hard to code, but it will take time to execute.
• Maximize your chances of finding a collision by relying on the Birthday problem: For any two messages 𝑚0 and 𝑚1, where 𝑚0 ≠ 𝑚1, find 𝐻(𝑚0) = 𝐻(𝑚1). This requires a little more code (and memory usage), but will find a collision more quickly. Consider using a hashtable or dictionary, but be careful about efficiency as finding collision on 50-bit outputs is right at the edge of what is feasible by an average computer.
For multiples of 2 bits (i.e. for digests sized 8, 10, 12, ..., 48, 50 bits), measure both the number of inputs and total time for a collision to be found. Create two graphs: one which plots digest size (along the x-axis) to collision time (y-axis), and one which plots digest size to number of inputs. Include these graphs in your report.
2. Task 2: Breaking Real Hashes. B-crypt is a hashing algorithm that is based on the blowfish algorithm and is designed to be a slow hash function. You have recovered a shadow file (you can download it from Canvas) which has users stored in the following format:
“User:$Algorithm$Workfactor$SaltHash” where salt is 22 characters base64 encoded and hash is the remainder.
For example:
“Bilbo:$2b$08$L.z8uq99JkFAvX/Q1jGRI.TzrHIIxWMoRi/VzO1sj/UvVFPg
W8dW.”
represents:
User: Bilbo
Algorithm: 2b or bcrypt
Workfactor: 8
Salt: L.z8uq99JkFAvX/Q1jGRI.
Hash value:TzrHIIxWMoRi/VzO1sj/UvVFPgW8dW.
This file is generated using the bcrypt library. Each user chooses a password that is a single word from the nltk word corpus between 6 and 10 letters long. Your job, crack each user’s password. You can’t use any password cracking tools. Your solution should be a custom cracking script. Record how much time it takes to crack the password for each user. If possible, feel free to crack passwords with common salts simultaneously.
Questions
1. What do you observed based on Task 1b? How many bytes are different between the two digests?
2. What is the maximum number of files you would ever need to hash to find a collision on an n-bit digest? Given the birthday bound, what is the expected number of hashes before a collision on a n-bit digest? Is this what you observed? Based on the data you have collected, speculate on how long it might take to find a collision on the full 256-bit digest.
3. Given an 8-bit digest, would you be able to break the one-way property (i.e. can you find any pre-image)? Do you think this would be easier or harder than finding a collision? Why or why not?
4. For Task 2, given your results, how long would it take to brute force a password that uses the format word1:word2 where both words are between 6 and 10 characters? What about word1:word2:word3? What about word1:word2:number where number is between 1 and 5 digits? Make sure to sufficiently justify your answers.
In Your Report
Please address the following in your report:
1. What you did and what you observed;
2. Include all code that you wrote;
3. Please include answers to all questions;
4. Please include any explanations of the surprising or interesting observations you made;
5. Write at a level that demonstrates your technical understanding, and do not shorthand ideas under the assumption that the reader already “knows what you mean”. Think of writing as if the audience was a smart colleague who may not have taken this class;
6. Describe what you did in sufficient detail that it can be reproduced. Please do not use screenshot of the VM to show the commands and code you used, but rather paste (or carefully retype) these into your report from your terminal. Submit your completed write up to Canvas in PDF format. Also remember that a picture can be worth a thousand words to graphing your results with good captions can be beneficial.