Asynchronous MPI Notes

Here is a nice paper on a library called APSM (Asynchronous Progress Support for MPI) that allows for MPI implementations with asynchronous non-blocking point-to-point operations:

arXiv:1302.4280

Motivating Example: A simple communication overlap benchmark.  This benchmark derives from co-author Wellein (from the above paper), and was recast in a nice parallel computation lab module by Rich Vuduc here.

I like this example, because it's the simplest demonstration of asynchronous MPI without adding complexity.

If there is no asynchronous progress, we would expect that the total time duration tt is the sum of a communication component tc and latency component tw:

tt = tc + tw

If you have a true asynchronous process, then you can hide the communication underneath the latency:

tt = max(tc,tw)

With MVAPICH2 (tested with 2.1), the following runtime environment variables need to be set:

mpiexec -env MPICH_ASYNC_PROGRESS 1 -env MV2_ENABLE_AFFINITY 0 ./yourprogram