Starting from:

$35

BBM204- Assignment 2: Hash Tables Solved

1           Introduction
Searching and finding data on a data structure is a painful process. Various search algorithms have been created and various data structures have been used. For example, the hashing data structure which is designed to use a special function called the Hash function is one of them. Hash function maps a given value with a particular key and it allows faster access to elements. The efficiency of mapping depends of the efficiency of the hash function used. Finding the appropriate hash function which will always produce a unique key is very difficult or even impossible in some cases. If the same key is generated from two different values, this is called collision. A hash function should be able to take any length value and it should be able to generate a key of the specified length. It should not allow collision. Hash table is data structure that holds data and keys obtained by hashing function together. Assume that we have the set of integer items 54, 26, 93, 17, 77, and 31. The size of hash table is m=11. Hash function takes the set of integer items and return a key as its hash value between 0 and m-1 (h(item)=item%11). A hash table with six items is shown in Figure 1.

Figure 1: Hash Table with six items

If, when an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it. There are several methods for dealing with this:

Separate chaining
Open addressingLinear Probing
Quadratic Probing
Double Hashing
The load factor α of a hash table with n elements is given by the following formula:

                                                                             α = n/table.length                                                                        (1)

Thus, 0 <α< 1 for linear probing. (α can be greater than 1 for other collision resolution methods. With separate chaining, it is possible to have α > 1. In separate chaining, table size equals to the number of linked lists, so α > is the average length of the linked lists. Load factor refers to the

Page 1 of 7

size of the array and the density of the filled cells and it depends upon the hashing algorithm and collision algorithms selected. Some algorithms require a load factor of about .5 or .6; others allow a load factor of > 1.

In this assignment only three of the above-mentioned methods which are Separate Chaining ,Linear Probing and Double Hashing will be exemplified.

Separate Chaining is defined as a method by which linked lists of values are built in association with each location within the hash table when a collision occurs

If we want to add values 44, 55 and 20 respectively to the hash table, there will be a collision due to the same hash value. In Figure 2, separate chaining method is used for collision resolution. A list of all elements that hash to the same value is kept. The array elements are pointers to the first nodes of the lists. A new item is inserted to the front of the list or end of the list.

Figure 2: Collision resolution with separate chaining

Linear Probing During insert of key k to position p: If position p contains a different key, then examine positions p+1, p+2, etc.* until an empty position is found and insert k there. During a search for key k at position p: If position p contains a different key, then examine positions p+1, p+2, etc.* until either the key is found or an unused position is encountered.

Double Hashing is used to reduce secondary clustering. This method uses a second hash function to generate different probe sequences for different keys to drive the collision resolution.

Assume that we will add to hash table integer items 14, 17, 25, 37, 34, 16, 26 respectively and the first hash function is h1(item)=item%11 and the second hash function is h2(item) = item % 7 + 1. These functions are taken as examples and double hashing can be done by different functions. Double hashing can be done using :

                                                           (h1(item)+ i ∗ h2(item))%TABLE_SIZE                                                      (2)

Here h1() and h2() are hash functions and TABLE_SIZE is size of hash table. (We repeat by increasing i when collision occurs)

14 mod 11 = 3

17 mod 11 = 6

25 mod 11 = 3 (Collision) —> (h1(25) + 1 * h2(25)) % 11 =8

37 mod 11 = 4

34 mod 11 = 1

16 mod 11 = 5

26 mod 11 = 4 (Collision) —> (h1(26) + 1 * h2(26)) % 11 =10

Figure 3: Collision resolution with double hashing

2           Problem Definition
In this assignment, you are expected read a data file and create a hash table for the data and apply different collision resolution methods. The data set for this assignment is a collection of employees information data. Included in this data are as below:

employeeCode: A unique Identification code for each employee.
NRIC: The National Registration Identity Card number.
phone: Phone number of the employees.
Note: In this assignment you are NOT allowed to use Java’s (or other languages’) default hash table/map/sorting/linked list/etc utilities.

3           Experiment
3.1           Part A: Separate Chaining
Each line of the dataset represents one employee’s information. Create a class Employee whose instance can contain all the data from an employee. It should contain:The employeeCode
The NRIC
The phone
Create a basic linked list class.
Create a class MyHashTable. This will be your hash table class you will use to look up test data later in the assignment. It should minimally contain:A private table member. This should be an array of linked lists. The linked lists will hold EmployeeData objects. The size of the linked lists depends on load factor. You will handle collisions by inserting at the end of the linked list contained in the slot calculated by the hash function. (see below)
A private hash() method. This method will take a key (a phoneNumber) and generate a number (hashvalue) between 0 and TABLE_SIZE (the size of the table array) based on the given key. Generate the key value using the following hash function:
int hash1(int key)

{ return (key % TABLE_SIZE);

}

A public put() method. This method will take a key (a phoneNumber) and an Employee object, and insert it into the hash table. It should make use of hash() to decide which table index to insert the object into.
A public get() method. This method will take a key (a phoneNumber) and retrieve (return) the full Employee object that has the given phoneNumber. It should make use of hash() to decide the table slot to look for the object in.When you use hash() to get a table index, the linked list you find there will probably have multiple objects in it. Simply linear search through this linked list when you get there.
Make sure your get() function can handle cases where the full object doesn’t exist in the table!
3.2           Part B: Linear Probing and Double Hashing
In this part, you should read dataset as in the Part A and you should create the hash table for both Linear Probing and Double Hashing.

You should use hash function which is given above for first hash function
Second hash function for double hashing is given below:
public int doubleHashFunction(int key) { return 1 + (key % (TableSize - 1));

}

4           Input and Output Format
Sample Input

The collection of employee information data will be used for this assignment. The format of the file given to you as input is txt. This data includes EmployeeCode, NRIC, and PhoneNumber.A part of the dataset for creating a hash table is shown below in Figure 4.The input consists of some instances. The first line of the file is column names separated by a space. This is followed by lines containing three columns about employee information separated by a space.

Sample output

The representation of the output file is shown in Figure 5. The entire output file will be shared with you. First line of your output file should contain test case arguments as shown below: <input_file_name>,<load_factor1>,<load_factor2><search_key>

And it should be followed by hash tables of each part. And lastly, you should print comparisons and CPU time taken to search key given as a parameter for each collision method which are separate chaining, linear probing, and double hashing respectively. For testing your program, command-line parameters should be as shown below. Command-line parameters:

>javac main.java

>java main <input_file> <load factor of Part1> <load factor of Part2> <search key>

Figure 4: Sample Input

The assignment must be original, individual work. Duplicate or very similar assignments are both going to be considered as cheating.
The output of your program will be graded automatically. So create your outputs according to the shared sample output file.
Write READABLE SOURCE CODE block
You can ask your questions via Piazza (https://piazza.com/hacettepe.edu.tr/fall/bbm203) and you are supposed to be aware of everything discussed in Piazza.
Figure 5: Sample Output

 

More products