The default MPI installation on Oakley and Ruby, mvapich2/2.1, appears to have a bug that is triggered by certain programs. The symptoms are 1) the program hangs or 2) the program fails with an error related to Allreduce.
To test whether a failure is related to this issue, as opposed to an error in your code, set the following environment variable in your batch job: MV2_USE_SHMEM_COLL=0 This option disables optimizations. If your program runs correctly, you can blame the MPI library.
There are several workarounds to choose from.