You (plus optional teammate) are tasked with the job of making the fastest matrix multiplication program as possible for all machines. That means you cannot specifically target a machine. But you are free to research and find all usual architectures specification for personal and server machines. You may assume that everything is Intel architecture (x86_64) to make life easier. Background Reading: Chapter 4.12 The matrix is column major. Naïve implementation is given in dgemm-naive.c and you can run the bench-naive to see the output. void dgemm( int m, int n, float *A, float *C ) { for( int i = 0; i < m; i++ ) for( int k = 0; k < n; k++ ) for( int j = 0; j < m; j++ ) C[i+j*m] += A[i+k*m] * A[j+k*m]; } C is where the result is stored and we are doing all the calculations from just one matrix, A. You are required to do all the calculations and no optimization is allowed on this front to make benchmarking easier. Zip contains the following files : Makefile: to make and benchmark benchmark.c: do not modify. It check results and produce performance numbers dgemm-naive.c: naïve implementation as shown above dgemm-optimize.c: your optimization Choose at most 3 of the following common optimizations (1 per function,