Your job utilizing openmpi/4.1.x-hpcx (or 4.1.x on Ascend) might hang while writing files on a shared file system. This issue is caused by a bug stemming from the default OMPIO I/O module and UCX library. We have identified ORCA as being affected by this problem. If you are experiencing this issue, please consider the following solutions:
- Change the I/O module to ROMIO by adding
export OMPI_MCA_io=romio321
to your job script. - Switch to OpenMPI 5. You can check for available OpenMPI 5 moduless via
module spider openmpi/5
.