MVAPICH 3.0 hang due to PMI mismatch with Slurm
Applications such as Quantum ESPRESSO, LAMMPS, and NWChem experienced hangs with MVAPICH 3.0 due to a PMI mismatch. MVAPICH 3.0 was built with PMI-1, while newer Slurm versions on RHEL 9 use PMI-2. Although the development team states that using the PMI-1 interface with Slurm’s PMI-2 implementation should work, there may be a bug in MVAPICH 3.0.
We are currently testing MVAPICH 4.1 and plan to migrate the software stack associated with MVAPICH 3.0 to MVAPICH 4.1 in the coming weeks.