The purpose of this lab is to get you familiar with using the CUDA streaming API by re-implementing a the vector addition machine problem to use CUDA streams. Prerequisites Before starting this lab, make sure that: You have completed the vector addition machine problem Instruction Edit the code in the code tab to perform the following: Allocate device memory Interleave the host memory copy to device to hide Initialize thread block and kernel grid dimensions Invoke CUDA kernel Copy results from device to host asynchronously Instructions about where to place each part of the code is demarcated by the //@@ comment lines.