LS-DYNA mpp-dyna Cardinal: Remote access error on mlx5_0:1, RDMA_READ
You may encounter the following error while running mpp-dyna jobs with multiple nodes:
You may encounter the following error while running mpp-dyna jobs with multiple nodes:
Users may encounter the following errors when compiling a C++ program with GCC 13:
error: 'uint64_t' in namespace 'std' does not name a type
or
Several applications using OpenMPI, including HDF5, Boost, Rmpi, ORCA, and CP2K, may fail with errors such as
mca_coll_hcoll_module_enable() coll_hcol: mca_coll_hcoll_save_coll_handlers failed
or
Caught signal 11: segmentation fault
We have identified that the issue is related to HCOLL (Hierarchical Collectives) being enabled in OpenMPI.
If you are getting an error:
UnavailableInvalidChannel: HTTP 403 FORBIDDEN for channel intel <https://conda.anaconda.org/intel> while creating a python environment or installing python packages, you can solve it by running the command
conda config --remove channels intel.
If you would like to use intel hosted packages for python environment, you can access them by running the following command
The newly released version of NumPy 2.0 includes substantial internal changes, including migrating code from C to C++. These modifications have led to significant issues with backwards compatibility, resulting in numerous breaking changes to both the Python and C APIs. As a consequence, packages built against NumPy 1.xx may encounter ImportError messages. To ensure compatibility, these packages must be rebuilt against NumPy 2.0.
Recommendation for Addressing the Issue:
You may experience a multi-node job hang if the job runs into a module that requires heavy I/O, e.g., MP2 or CCSD. Additionally, it potentially leads to our GPFS performance issue. We have identified the issue as related to the MPI I/O issue of OpenMPI 4.1. To remedy this, we will take the following procedures:
There are changes on MPI libraries on Pitzer after May 20. We will upgrade MOFED from 4.9 to 5.6 and recompile all OpenMPI and Mvapich2 against the newer MOFED version. Users with their own MPI libraries may see job failures and will need to rebuild their applications linked against the MPI libraries.
The default CPU binding for ORCA jobs can fail sporadically. The failure is almost immediate and produces a cryptic error message, e.g.: