CIS-4230 Homework #5 (MPI and OpenMP)

Due: Wednesday, April 18, 2018

In this assignment you will take the serial version of the Barne-Hut implementation of Solarium and parallize it by using MPI to distribute the computation over the cluster and OpenMP to parallelize the cluster nodes appropriately.

Warning: If you are taking CIS-5230, be sure to read the CIS-5230 specific section below before continuing with this section!

Do the following steps:

  1. If you have not done so already, clone the Solarium repository using a command such as:

            $ git clone https://github.com/pchapin/solarium.git Solarium
          

    You should do this so the Solarium folder that is created is a sibling of the Spica folder. If you already have a clone of Solarium on your system, be sure it is updated to the latest version by changing into the top level folder of the project and using the command:

            $ git pull
          

    If you are interested in creating a branch for your work, see my document on using Git at VTC for more information. Creating a branch is not required, but it can be a nicer way to work.

  2. Baseline the current serial version of the Barnes-Hut implementation (in the Barnes-Hut folder). To do this I suggest you first change the value of OBJECT_COUNT in Common/global.h to increase the number of objects involved in the calculation by a factor of three (to 3000). This larger value will help swamp communication overheads on the cluster and give a more satisfying and compelling result. You should also edit the main.c file in Barnes-Hut so the computation stops after just 1000 time steps. Otherwise it will take a long time to run your tests. Be sure to remake the files in Common/global and remake the main program in Barnes-Hut.

    Measure how long the serial version takes on one of the nodes and record the result.

  3. Using the code in the MPI folder as a guide, convert the Barnes-Hut version to use MPI. Let each node build the octree for every time step, but focus instead of parallelizing the computations. Using as many nodes as possible, rerun the program and measure its performance. What speed up do you see compared to the serial version? Review my notes on setting up MPI on lemuria for more information on how to run MPI programs.

  4. Make a further modification to your code that uses OpenMP on each node to parallelize the execution of that node's computational loop. Make a third measurement to see what improvement, if any, that provides. Calculate the overall speed up relative to the serial version.

Submit your modified Barnes-Hut program (in a zip archive if necessary) to Moodle. Include your timings as a write-up either in a separate file or as comments in one of the source files (as long as they are obvious).

CIS-5230

Do the following steps:

  1. Clone the Solarium repository (or update your existing clone) as described above.

  2. Build and execute the serial version of the Barnes-Hut program as described above.

  3. Before making any changes to the program, profile it to explore the question of how the execution time is distributed between the octree construction step and the computational step. This entails editing the Makefile to add appropriate options to the compiler, running the program with profiling enabled, and then running gprof and interpreting the results. It would be best to do this on one of the nodes. Record your observations to include in your write-up. Warning: The program is likely to take significantly longer to run when being profiled. This is normal.

  4. Do the other steps described above for converting the program to an MPI/OpenMP hybrid.

  5. Even if the octree construction is found to have an insignificant effect on overall performance, I want you to consider the problem of building the tree in parallel as an exercise. This will entail parallelizing the loop that inserts objects into the tree. You also need to add a lock to each tree node to ensure that two threads do not manipulate the same node (or its immediate children) at the same time. Since you are using OpenMP threads, you need to be sure the locks are compatible with OpenMP. See this page for some quick hints. The OpenMP specification contains more details.

    Measure the performance of your adjusted program and include the results in your write-up.

Submit your final program and write-up to Moodle as described above.