CIS-4230 Homework #2 (Gaussian Elimination)

Due: Thursday, February 13, 2020

Modify the serial version of the Gaussian elimination program to parallelize it using pthreads, following the discussion in class. I recommend naming the modified program linear_equations-pthreads.c (thus keeping a copy of the original program). In the zip archive you will find the serial version of the program. You might have to modify the Makefile to adjust the path to where the Spica library is located.

Measure the speed up of your program relative to the serial version. Measure the speed ups for different problem sizes running from 1000x1000 to 2000x2000 in six evenly spaced steps (1000x1000, 1200x1200, etc.). Make a graph showing speed up vs problem size.

As the problem size increases you might expect the speed up to improve since the overhead of managing the threads will be proportionally smaller as the amount of work the threads have to do increases. However, for the larger problem sizes, the matrix of coefficients will exceed the size of the processor's memory cache and memory access efficiency will decline. What results do you see?

Information about the processors in lemuria is here. Note that lemuria contains two of these processors (and thus two processor caches).


If you are a graduate student, you should also do the following:

To improve the memory access patterns of the program, let's try a simple device. Each time the threads do a round of work they should switch directions. For example, during the first round each thread should process rows from top to bottom. Then during the next round each thread should process rows from bottom to top, and continue alternating directions from round to round. Thus the data loaded into the cache at the end of round n will be immediately available at the beginning of round n+1.

Modify both the serial and paralle versions of your program to use this approach and remeasure the performances and speed ups. Do you observe any improvement? You would expect improvement, if any, to occur for the larger problem sizes where the matrix of coefficients is too large for the cache.


Submit your modified linear_equations-pthreads.c (and linear_equations.c, if required) file(s) along with the results of your timings, graphs, and associated comments in a zip archive to Canvas.

OPTIONAL: You might rerun the experiment on a cluster node to see if and how it behaves differently. Note that the cluster node processors are these.