$25
Assignment #2: The Big Dot II
The dot product of two vectors π = (π$, π’, …, π()’ ) and π = (π$, π’, …, π()’ ), written π β π, is simply the sum of the component-by-component products:
π β π = ∑(-/)$’ π- × π-
Dot products are used extensively in computing and have a wide range of applications. For instance, in 3D graphics (n = 3), we often make use of the fact that π β π = |π||π|πππ π, where | | denotes vector length and π is the angle between the two vectors. In this assignment, you are expected to:
1. Write a CUDA code to compute in parallel the dot product of two random single precision floating-point vectors with size N = 1<<24;
2. Write two kernel functions for the dot product computation on GPU:
• I) kernel1: use shared memory and parallel reduction to calculate partial sum on each thread block. (Add up all the partial sums on CPU after transferring all the partial sums back to host from device)
• 2) kernel2: use shared memory, parallel reduction, and atomic function or atomic lock to perform the entire computation on GPU. (Transfer the final dot product result back to host from device)
3. Compare the time it takes for kernel1 and kernel 2. (Use cudaEventRecord() for the timing.)
4. Turn in your source code on Canvas with a readme.txt to explain whatever I need to know to run your code successfully.