# CIS-4230/5230 Homework #4 (Matrix Multiplication)

Due: Wednesday, April 8, 2020

For this assignment you are to write a program that uses MPI and OpenMP and that parallelizes the multiplication of two, large matrices. I am purposely leaving many of the details in your hands, but here are some hints:

• The CreateSystem program in the LinearEquations project generates afile with random values for a large, square matrix. That program also generates driving vector values. Modify CreateSystem to remove the driving vector values and recompile it (using the C++ compiler). I recommend renaming it as well to prevent future confusion.

• Your program should accept two file names on the command line, with one file for each input matrix. It should write the product matrix to a third file (also specified on the command line. Since these matrices are large, storing them in files will make them much easier to manipulate. See the LinearEquations sample for ideas about the file handling.

• You should write a serial version of the program first, to serve as a baseline for performance measurements. Be sure to compile all of your code with suitable optimization turned on (see the Makefiles in the various samples that we've looked at... the Solarium MPI sample, for example).

• You'll probably need to borrow the macros in linear_equations.h to access the variable sized multi-dimensional arrays. Alternatively if you commit to using C99 (there is an option for that, see the Makefiles), you can use C99's "variable length array" feature instead. This basically means you can declare local arrays using a dynamically computed expression (not allowed in the older C90 standard). Thus:

```        int size;
...
// Read the value of 'size' out of a file
...
double my_matrix[size][size];  // Dimension array using dynamic value.
```

One problem you might run into with this is that local arrays have limited size. If you try to create an extremely huge local array, you will likely get a stack overflow. However, you can do things like this:

```        int size;
...
// Read the value of 'size' out of a file.
...
double (*my_matrix)[size] = ((*)[size])malloc( size*size * sizeof(double) );
```

The idea is to create a pointer to an array and initialize it. You can then access the elements of that array in the usual way: my_matrix[i][j]. I'm actually not sure if the cast will be accepted by the compiler... but I think so? Give it a shot!

• When doing timings, don't include the time spent reading/writing the files.

• Assign each cluster node to work on a set of rows of the result matrix. Each node will need, at minimum, a copy of the entire second operand matrix, and the appropriate rows from the first operand matrix. That is, if you are computing A = B*C, you need to send all of C to every node, but only the necessary rows of B to every node. Does this really matter? Think about asymptotic execution and communication times.

• Use OpenMP to parallelize the node programs. You should find this to be quite trivial.

• Since matrix multiplication is a O(n3) operation, similar to Gaussian eliminiation, I would expect that similar sized matrices could be used during testing and timing.

## Submission

Submit your final program to Canvas.