SP Programming Assignment # 4 - a multiclass classifier with thread Solved
In assignment 4, you are required to implement a multiclass classifier with thread. You should train a model to classify handwritten digits from MNIST dataset. And the most important part is that you have to accelerate the matrix multiplication (let it parallelly) by thread.
In training, you should decide how many iterations you train your classifier on your own.
For each iteration, you update your classifier with:
X:training data matrix (60000 * 784)
W:weight matrix (784 * 10)
(If you consider adding bias to your classifier, you can let W to be 785*10, and add a column with 1’s to X.)
y_hat:predicted label (60000 * 10)
y:true label, you need to transform each label to one-hot. (60000 * 10)
(if label = 2 = [0, 0, 1, 0, 0, 0, 0, 0, 0, 0])
lr:learning rate (scalar)
You need to create threads to accelerate the matrix multiplication in (1). We will tell you how many threads you should create, and each thread should calculate [60000 / thread_num] rows multiplication.
(For example, if thread_num = 1000, each thread should be responsible for “60” rows ([60*784] * [784*10]) multiplication in (1).)
To evaluate the accuracy, you may choose the label with largest probability from 10 classes for each image.
To be fair, you can’t use pre-trained weight matrix, you need to initialize your weight matrix like 785*10 0.0’s or 0.5’s.
2. Format of Inputs & Outputs
Input:MNIST dataset (including 4 files)
X_train:60000 images, 784 pixels for each image, value:0~255
y_train:60000 labels, value:0~9
X_test:10000 images
y_test:10000 labels
(Note: You can use X_test and y_test to check your classifier’s accuracy, but don’t use them to train your classifier.)
data link : https://drive.google.com/drive/folders/1wips8uJtKFIlnXVzu2fDC_SRjbxaxiD2?usp=sharing
Output:result.csv
format:
3.Sample Execution
./hw4 [X_train] [y_train] [X_test] [number of threads]
(compile:gcc hw4.c -lm -lpthread -O3 -o hw4)
(When we mark your assignment, we will use our private testing dataset (size = 10000), your program should output the result.csv file according to the testing data we specified.)