The purpose of this lab is to get you familiar with using the CUDA API by implementing a simple vector addition kernel and its associated setup code. Prerequisites Before starting this lab, make sure that: You have completed "Lab Tour with Device Query" MP You have looked over the tutorial document. Chapter 3 of the text book would also be helpful Instruction Edit the code in the code tab to perform the following: Allocate device memory Copy host memory to device Initialize thread block and kernel grid dimensions Invoke CUDA kernel Copy results from device to host Free device memory Write the CUDA kernel Instructions about where to place each part of the code is demarcated by the //@@ comment lines.