# CIS-4230 Homework #3 (Gaussian Elimination)

Due: Monday, February 28, 2018

Modify the "simple" pthreads version of the Gaussian elimination program to use barriers. In the zip archive LinearEquations.zip you will find a serial version of the program and a simple pthreads version. The file linear_equations-barriers.c is a copy of the simple pthreads version. Modify this file to create the threads only once and use barriers to guide their execution as discussed in class.

Measure the speed up of your program relative to the serial version and compare it to that obtained by the simple pthreads version. Do the barriers provide any improvement? Measure the speed ups for different problem sizes running from 1000x1000 to 2000x2000 in five evenly spaced steps (1000x1000, 1200x1200, etc.). Make a graph showing speed up vs problem size for both the simple pthreads version and your barrier version.

As the problem size increases you might expect the speed up to improve since the overhead of managing the threads will be proportionally smaller as the amount of work the threads have to do increases. However, for the larger problem sizes, the matrix of coefficients will exceed the size of the processor's memory cache and memory access efficiency will decline. That will tend to make the speed up worse. What results do you see?

Information about the processors in lemuria is here. Note that lemuria contains two of these processors (and thus two processor caches).

## CIS-5230

If you are a graduate student, you should also do the following:

To improve the memory access patterns of the program, let's try a simple device. Each time the threads do a round of work they should switch directions. For example, during the first round each thread should process rows from top to bottom. Then during the next round each thread should process rows from bottom to top, and continue alternating directions from round to round. Thus the data loaded into the cache at the end of round n will be immediately available at the beginning of round n+1.

Modify your barriers version of the program to use this approach and remeasure the speed ups. Do you observe any improvement? You would expect improvement, if any, to occur for the larger problem sizes where the matrix of coefficients is too large for the cache.

## Submission

Submit your modified linear_equations-barriers.c file along with the results of your timings, graphs, and associated comments in a zip archive to Moodle.

OPTIONAL: You might rerun the experiment on a cluster node to see if and how it behaves differently. Note that the cluster node processors are these.