Intel oneAPI MPI 2021.10.0 Fails with UCX 1.18

Category: 
Resolution: 
Unresolved
Affected Software: 

After the downtime on August 19, 2025, users may encounter UCX errors such as:

UCX ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable

when running a multi-node job with intel-oneapi-mpi/2021.10.0.

During the downtime, we upgraded MOFED from 23.10 to 24.10, which also upgraded UCX from 1.16 to 1.18. We suspect that UCX 1.18 (and later versions) may not be fully supported by Intel MPI 2021.10.0. We are continuing to investigate this issue and working on a permanent solution.

Workaround

As a temporary fix, you can load hpcx/2.17.1 (which includes UCX 1.16) before running your job. If the issue still occurs, try rebuilding your application with hpcx/2.17.1 loaded.