Modify the pargpu.cu code (from the class examples) where number of threads are specified as a two dimensional variable. In CUDA, dim3 data type can be used to specify two or three dimensional elements. Specifically, number of threads in each dimension of a 2-D block should be parameters to the program. Your program should be executed as follows,
$ ./q1 {number of elements} {rows} {cols}
Where rows * cols <= 1024 (max threads per block}
Q2 [XOR based Checksum]
Given randomly generated N numbers {X1, X 2, …..X N} as input, find out the XOR sum as follows,
SUM = X1 XOR X 2 XOR X 3 ….. XOR X N
You can use maximum O(1) extra space to perform the operations on GPU. Further, the input copied to the device memory is not required to maintain the old values after completion of the program.
Your program should take two arguments, i.e., number of elements and a random seed as command line parameter. In your program (main function) should execute srand (seed) and generate random inputs before invoking the GPU kernel (refer class examples). Your program should print the final output in the console.