$40
KIT308/408 Multicore Architecture and Programming
Assignment 2 — SIMD
Aims of the assignment
The purpose of this assignment is to give you experience at writing a program using SIMD programming techniques. This assignment will give you an opportunity to demonstrate your understanding of:
structures of arrays;
the where there is one there is many approach (WTIOTIM); the use of SIMD for calculation; and the translation of conditional statements into SIMD.
Forms to request extensions of time to submit assignments are available from the Discipline of ICT office. Requests must be accompanied by suitable documentation and should be submitted before the assignment due date.
Assignment Submission
Your assignment is to be submitted electronically via MyLO and should contain:
An assignment cover sheet;
A .zip (or .rar) containing a Visual Studio Solution containing a project for each attempted stage of the assignment (in the format provided in the downloadable materials provided below).
A document containing:
A table of timing information comparing the original single-threaded code (from assignment 1), the provided base code times, and the Stage 1 & 2 SIMD implementation on each scene file (all running with 8 threads). An analysis of the above timing data.
You do not need to (and shouldn't) submit executables, temporary object files, or images. In particular, you must delete the ".vs" diretory before submission as it just Visual Studio temporary files and 100s of MBs. Do not however delete the "Scenes" folder or the "Outputs" folder (but do delete the images within this one).
Marking
This assignment will be marked out of 100 (NOTE: there are 120 marks available, but a maximum of 100). The following is the breakdown of marks:
Task/Topic
Marks
1. Triangles
80%
A. Structures of Arrays
15%
Correct declarations of data structures for SoA SIMD form of Triangle container data
5%
Correct code to convert AoS container to dynamically declared SoA structures (conversion should happen after the scene has been loaded)
5%
Correct rewriting of isTriangleIntersected function and the calls to it to use SoA code
5%
B. Where there is One there is Many (WTIOTIM)
10%
Correct implementation of WTIOTIM approach to isTriangleIntersected function to calculate distance to nearest object — to be used in the objectIntersection function (in Intersection.cpp)
5%
Correct implementation of WTIOTIM approach to create a short-circuiting isTriangleIntersected function to return (as soon as possible) whether any sphere intersects — to be used in the isInShadow function (in Lighting.cpp)
5%
C. SIMD Conversion of short-circuiting isTriangleIntersected function
45%
Correct conversion of scalar inputs to SIMD
5%
Correct and efficient intersection point t0 calculation
5%
Correct and efficient (e.g. no use of if-statements) handling of cases where no intersection occurs due to ray being parallel(ish) with triangle (i.e. first if-statement)
5%
Correct and efficient (e.g. no use of if-statements) handling of cases where no intersection occurs due to point being outside of triangle (i.e. second and third if-statements)
5%
Correct and efficient (e.g. no use of if-statements) handling of cases where intersection does occur (i.e. fourth if-statement)
5%
Correct and efficient (e.g. no use of if-statements, for-loops, or scalar code) calculation of intersection data
5%
Correct declaration of loop constants before loop
5%
Correct handling of sphere list length not being divisible by 8
5%
No remaining scalar code (except for-loop) and correct final output of SIMD code
5%
D. SIMD Conversion of non-short-circuiting isTriangleIntersected function
10%
Correct and efficient (e.g. no use of if-statements) SIMD calculation of closest intersection (i.e. t)
5%
Correct and efficient (e.g. no use of if-statements) SIMD calculation of corresponding triangle index (i.e. index)
5%
2. Lighting
30%
A. Structures of Arrays
10%
Correct declarations of data structures for SoA SIMD form of Light container data
5%
Correct code to convert AoS container to dynamically declared SoA structures (conversion should happen after the scene has been loaded)
5%
B. SIMD Conversion of lighting functions
20%
Correct SIMD conversion of applyLighting function (NOTE: the calls to isInShadow can be done with a scalar loop)
5%
Correct SIMD conversion of applySpecular function (i.e. so it handles 8 lights)
5%
Correct and efficient (e.g. no use of switch-statements) SIMD conversion of applyDiffuse function and applyCheckboard, applyCircles, and applyWood functions (i.e. to make it handle 8 lights)
10%
Documentation
10%
Outputs showing timing information for Stages 1 & 2 (with 8 threads) on all applicable scene files
5%
Analysis of data (comparison between Stages 1 & 2 and the base code)
5%
Penalties
Failure to comply with submission instructions (eg. no cover sheet, incorrect submission of files, abnormal solution/project structure, etc.)
-10%
Poor programming style (eg. insufficient / poor comments, poor variable names, poor indenting, obfuscated code without documentation, compiler warnings, etc.)
up to -20%
Lateness (-20% for up to 24 hours, -50% for up to 7 days, -100% after 7 days)
up to -100%
Programming Style
This assignment is not focussed on programming style (although it is concerned with efficient code), but you should endeavour to follow good programming practices. You should, for example:
comment your code; use sensible variables names; use correct and consistent indenting; and internally document (with comments) any notable design decisions.
[NOTE: any examples in the provided assignment materials that don't live up to the above criteria, should be considered to be deliberate examples of what not to do and are provided to aid your learning ;P]
The Assignment Task
You are to modify the (square-based) multithreaded implementation of a "simple" raytracer from the first assignment to take advantage of SIMD instructions. As time is tight for this assignment you only need to SIMD-ify the triangle/ray intersection code and the lighting code. This requires changes across multiple files.
To aid you in this conversion, the sphere/ray intersection code has already been translated into SIMD code (by following the techniques required to do stage 1 of the assignment).
From the provided (square-based) multithreaded raytracer implementation, for Stage 1 of the assignment you will create multiple subsequent versions that modify the triangle implementation as follows:
A. Translation of array-of-structures (AoS) with structures-of-arrays (SoA) for the triangle container.
B. Rewritten functions for the triangle/ray intersection tests in a where there is one there is many approach.
C. Optimisation of the triangle/ray intersection test to take advantage of SIMD.
For Stage 2, you will do most of that again, but for lights and the lighting calculations.
Implementation (Stage 1)
The following section describes in detail the steps required to complete Stage 1 of the assignment. If you complete this step, only then should you attempt Stage 2 and do similar steps for lights.
A. Structures of Arrays
This stage involves modifying the Scene struct (Scene.h) to store a duplication of the triangle data via structure of arrays rather than the currently existing array of structures (currently in the triangleContainer variable).
In order to complete this step you will need to:
Create a SoA copy of the data that is stored in triangleContainer in the Scene struct.
NOTE: this means the program has two copies of the triangle data, the original one in AoS form and a second one in SoA form. As this assignment predominately only requires changes to isTriangleIntersected (and the sites of the calls to this function) the rest of the code will still use the original AoS version of the data.
Fill this SoA copy of the sphere data after the Scene struct has been loaded (dynamically allocating memory for the SoA representation).
Rewrite the isTriangleIntersected function to make use of the SoA.
At times this code update may require conversions to and/or from Vector structs to the equivalently stored data in the SoA.
At the end of this stage the program should still produce the same results as the base code. Be thorough in your testing (i.e. test all the scenes) to ensure that everything works correctly before progressing. B. Where there is One there is Many
This stage involves replacing the isTriangleIntersected function with two new versions that use the where there is one there is many paradigm.
In order to complete this step you will need to:
Write two functions that take the entire triangles container (or at least a pointer/reference) as an argument and will perform whatever action took place at the calling site. There are two versions as the isInShadow function (in Lighting.cpp) uses a short-circuited approach, exiting as soon as any triangle collision is discovered, and the objectIntersection function (in Intersection.cpp) finds the closest triangle intersected with (which means it must examine them all).
A version of isTriangleIntersected that returns a boolean depending on whether or not a triangle intersects with the ray (i.e. stopping as soon as one is found).
A version of isTriangleIntersected that returns void and updates its t parameter to be the closest triangle that intersects with the ray.
C. SIMD Conversion of isTriangleIntersected function
This stage involves converting the first of the two new version of the isTriangleIntersected function from Stage 2 (the bool returning one) into a SIMD implementation.
In order to complete this step you will need to:
Convert many of the input parameters (or parts of them) into SIMD-ready values (e.g. the ray starting location and direction need to be converted into a SIMD format).
Step through the SoA in chunks proportional to the number of value stored in each SIMD variable (i.e. 8).
Convert the calculation to use SIMD. Care must be taken to correctly deal with the various conditional expressions in the loop. There are four ifstatements that must be converted to SIMD code via the use of appropriate masking statements.
Carefully deal with the situation where the triangle count isn't equally divisible by the number in each SIMD calculation (e.g. 9 triangle with 4 in each SIMD value).
NOTE: this is likely the hardest step of the assignment.
D. SIMD Conversion of isTriangleIntersected function
This stage involves converting the latter of the two new version of the isTriangleIntersected function from Stage 2 (the void returning one) into a SIMD implementation.
Most of this conversion is identical to step 3, and the code can just be copied from that function. It does however require the addition of:
New variables to keep track of the closest intersection found, and which triangle index it was found at.
Consolidation of the values calculated using SIMD into a single scalar return value. This should be done after the loop finishes.
Hints / Tips
The techniques required to complete each stage rely heavily on work done in the SIMD tutorial — refer to them often.
When implementing the SIMD stage, it's best to SIMD-ify small portions of code at a time (e.g. even a single line is often a lot) and then perform the rest of the calculation through the existing scalar code, or even compare the result of the SIMD version to the existing scalar code.
Again for SIMD, it's helpful to render scenes at a greatly reduced size for testing, with an equally small block size, one sample per pixel, and with one thread (e.g. try using allMaterialsZoom -size 8 8 -blockSize 8 -samples 1 -threads 1). It may even be helpful to fix the number of rays cast (MAX_RAYS_CAST) to be 1, so that reflection and refraction don't occur. This doesn't produce a very nice image, but images can be easily compared visually for problems (the example is only 64 pixels) and printf statements can be used to verify the correctness of calculations without outputting a huge amount of data (the use of debuggers is somewhat perilous in a multithreaded environment, but that shouldn't be an issue here).
When converting conditional statements to masks it can be helpful to do this with scalars first (although C/C++ implementations often use 1 rather than 0xFFFFFFFF for true, so the calculations may be different).
Write functions to output all the elements in a SIMD value for easier printf debugging (I am not a fan of how these types are shown in the debugger).
Documentation
When completing either stage of the assignment you should provide:
timing information for each scene file for the average time taken over 5 runs for a render using 8 threads and a block size of 16. See a later section for an example format for this timing table; and
an explanation of the results (e.g. why there's no difference between the performance of stage 1 compared to the base code, or why a particular implementation works well (or poorly) on a particular scene, etc.).
Sample Timing Table Format
The following table is an example of how the timing information could be presented in an easily comparable fashion.
See the following sections for more information about the scene files / tests to be used.
Test
Average Time (Over 5 Runs) (Milliseconds)
Base Multi-Threaded
Base SIMD Spheres
Stage 1 SIMD Triangles
Stage 2 SIMD Lighting
cornell
1024x1024x1
(TODO)233ms
cornell
1024x1024x4
(TODO)3433ms
cornell
500x300x1
(TODO)607ms
cornell-256lights 512x512x1
(TODO)607ms
allmaterials
1024x1024x1
(TODO)1575ms
5000spheres 960x540x1
(TODO)33856ms
bunny500.txt 1024x1024x1
(TODO)4336ms
bunny10k.txt
256x256x1
(TODO)114316ms
Provided Materials
The materials provided with this assignment contain:
the source code of the base multi-threaded version of the raytracer; a set of scene files to be supplied to the program.
Download the materials as an ZIP file.
Source Code Source Code
The provided code consists of 20 source files.
Raytracing logic:
Raytrace.cpp: this file contains the main function which reads the supplied scene file, begins the raytracing, and writes the output BMP file.
The main render loop, ray trace function, and handling of reflection and refraction is also in this file.
Intersection.h and Intersection.cpp: these files define a datastructure for capturing relevant information at the point of intersection between a ray and a scene object and functions for testing for individual ray-object collisions and ray-scene collisions.
Lighting.h and Lighting.cpp: these files provide functions to apply a lighting calculation at a single intersection point.
Texturing.h and Texturing.cpp: these files provide functions for the reading points from 3D procedural textures. Constants.h: this header provide constant definitions used in the raytracing.
Basic types:
Primitives.h: this header contains definitions for points, vector, and rays. It also provides functions and overloaded operators for performing calculations with vectors and points.
PrimitivesSIMD.h: this header contains operators for many common SIMD functions as well as a definition for Vector8 (as a representation for 8 vectors). It also provides functions and overloaded operators for performing calculations with Vector8s. SceneObjects.h: this header file provides definitions for scene objects (ie. materials, lights, spheres, and triangles).
Colour.h: this header defines a datastructure for representing colours (with each colour component represented as a float) and simple operations on colours, including conversions to/from the standard BGR pixel format.
Scene definition and I/O:
Scene.h and Scene.cpp: the header file contains the datastructure to represent a scene and a single function that initialises this datastructure from a file. The scene datastructure itself consists of properties of the scene and lists of the various scene objects as described above. The implementation file contains many functions to aide in the scene loading process. Scene loading relies upon the functionality provided by the Config class.
Config.h and Config.cpp: this class provide facilities for parsing the scene file. SimpleString.h: this is helper string class used by the Config class.
Image I/O:
ImageIO.h and ImageIO.cpp: these files contain the definitions of functions to read and write BMP files.
Miscellaneous:
Timer.h: this class provides a simple timer that makes use of different system functions depending on whether TARGET_WINDOWS, TARGET_PPU, or TARGET_SPU is defined (we don't use the latter two, but I left this file unchanged in case anyone wanted to see how such cross-platform stuff can be handled).
Executing
The program has the following functionality:
By default it will attempt to load the scene "../Scenes/cornell.txt" and render it at 1024x1024 with 1x1 samples.
By default it will output a file named "../Outputs/[scenefile-name]_[width]x[height]x[sample-level]_[executable-filename].bmp" (e.g. with all the default options, "../Outputs/cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp")
It takes command line arguments that allow the user to specify the width and height, the anti-aliasing level (must be a power of two), the name of the source scene file, the name of the destination BMP file, and the number of times to perform the render (to improve the timing information).
Additionally it accepts some arguments (that are currently unused) for setting the number of threads, whether each thread will tint the area that it renders, and the size of the block to render (only applicable for stage 3).
It loads the specified scene.
It renders the scene (as many times as requested).
It produces timing information for the average time taken to produce a render ignoring all file IO. It outputs the rendered scene as a BMP file.
For example, running the program at the command line with no arguments would perform the first test (as described in the scene file section):
On execution this would produce output similar to the following (as well as writing the resultant BMP file to ../Outputs/cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp): average time taken (1 run(s)): 890ms
Scene Files / Tests
Your testing should use the following scene files and parameters for the image size and anti-aliasing level:
Scene File
Width
Height
Anti-aliasing
Image
cornell.txt
1024
1024
1
cornell.txt
1024
1024
4
cornell.txt
500
300
1
cornell-256lights.txt
512
512
1
allmaterials.txt
1024
1024
1
5000spheres.txt
960
540
1
bunny500.txt
1024
1024
1
bunny10k.txt
256
256
1
You also might find the follow Scene / parameters useful for testing Triangle SIMD code conversion (using cornell-256lights at 8x8x1 is also somewhat useful for Stage 2):
Scene File
Width
Height
Anti-aliasing
Threads
Image
bunny500.txt
8
8
1
1
cornell-256lights.txt
8
8
1
1