Software

MPI-IO issues on home directories with Intel MPI 2019.3

Certain MPI-IO operations with intelmpi/2019.3 may crash, fail or proceed with errors on the home directory. We do not expect the same issue on our GPFS file system, such as the project space and the scratch space. The problem might be related to the known issue reported by HDF5 group. Please read the section "Problem Reading A Collectively Written Dataset in Parallel" from HDF5 Known Issues for more detail.

Affected versions

2019.3

Using mpiexec/mpirun with Intel MPI on Slurm

Intel MPI on Slurm batch system is configured to support PMI process manager. It is recommended to use srun as MPI program launcher. If you prefer using mpiexec/mpirun over Hydra process manager with Slurm, please add following code to the batch script before running any MPI executable:

unset I_MPI_PMI_LIBRARY I_MPI_HYDRA_BOOTSTRAP
export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0   # the option -ppn only works if you set this before

Multiple definition error

GNU compiler versions 10+ may have C compiler errors like

/.libs/libmca_mtl_psm.a(mtl_psm_component.o): multiple definition of `mca_mtl_psm_component'

This is a common mistake in C is omitting extern when declaring a global variable in a header file. In previous GCC versions this error is ignored. GCC 10 defaults to -fno-common, which means a linker error will now be reported. It is bypassable by appending the -fcommon to compilation flags.

Type mismatch error

GNU compiler versions 10+ may have Fortran compiler errors like

Error: Type mismatch between actual argument at (1) and actual argument at (2) (REAL(4)/REAL(8))

that result in a error response during configuration

Intermittent failure of default CPU binding

The default CPU binding for ORCA jobs can fail sporadically. The failure is almost immediate and produces a cryptic error message, e.g.

OpenMPI-HPCX 4.1.x hangs on writing files on a shared file system

Your job utilizing openmpi/4.1.x-hpcx (or 4.1.x on Ascend) might hang while writing files on a shared file system. This issue is caused by a bug stemming from the default OMPIO I/O module and UCX library. We have identified ORCA as being affected by this problem. If you are experiencing this issue, please consider the following solutions:

OpenMPI 4 and NVHPC MPI Compatibility Issues with SLURM HWLOC

A pure MPI application using mpirun or mpiexec with more ranks than the number of NUMA nodes may encounter an error similar to the following:

LS-DYNA mpp-dyna Cardinal: Remote access error on mlx5_0:1, RDMA_READ

You may encounter the following error while running mpp-dyna jobs with multiple nodes:

GCC 13 compilation errors due to missing headers

Users may encounter the following errors when compiling a C++ program with GCC 13:

error: 'uint64_t' in namespace 'std' does not name a type

HCOLL-related failures in OpenMPI applications

Several applications using OpenMPI, including HDF5, Boost, Rmpi, ORCA, and CP2K, may fail with errors such as

mca_coll_hcoll_module_enable() coll_hcol: mca_coll_hcoll_save_coll_handlers failed

Caught signal 11: segmentation fault

We have identified that the issue is related to HCOLL (Hierarchical Collectives) being enabled in OpenMPI.

Search form

Software

MPI-IO issues on home directories with Intel MPI 2019.3

Affected versions

Using mpiexec/mpirun with Intel MPI on Slurm

Multiple definition error

Type mismatch error

Intermittent failure of default CPU binding

OpenMPI-HPCX 4.1.x hangs on writing files on a shared file system

OpenMPI 4 and NVHPC MPI Compatibility Issues with SLURM HWLOC

LS-DYNA mpp-dyna Cardinal: Remote access error on mlx5_0:1, RDMA_READ

GCC 13 compilation errors due to missing headers

HCOLL-related failures in OpenMPI applications

Pages

Upcoming Events

Recent News

Translate

Ohio Department of Higher Education

State Government Links

Education Links

Search form

You are here

Software

Affected versions

Pages

Upcoming Events

Recent News

Translate

State Government Links

Education Links