After the downtime on August 19, 2025, users may encounter UCX errors such as:
UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable
when running a multi-node job with intel-oneapi-mpi/2021.10.0.
During the downtime, we upgraded MOFED from 23.10 to 24.10, which also upgraded UCX from 1.16 to 1.18. We suspect that UCX 1.18 (and later versions) may not be fully supported by Intel MPI 2021.10.0. We are continuing to investigate this issue and working on a permanent solution.
Workaround
As a temporary fix, you can load hpcx/2.17.1 (which includes UCX 1.16) before running your job. If the issue still occurs, try rebuilding your application with hpcx/2.17.1 loaded.