| Ohio Supercomputer Center

Testing Issue for Quantum Espresso 7.4.1 on Ascend

Benchmark AUSURF112 for quantum-espresso/7.4.1 on Ascend aborts. We suspect that this is a lurking bug in Quantum Espresso and are reporting it as a convenience. Concerned users can use Cardinal or Pitzer as a workaround.

Update on November 24, 2025: The issue is resolved with MVAPICH 4 variants.

cp2k/2023.2 can produce huge output containing MKL messages

On all clusters the cp2k executables from module cp2k/2023.2 can produce huge output files due to many many repeating errors from MKL, e.g.:

Some MKL environment variables have incorrect paths

MKL module files define some helper environment variables with incorrect paths. This can yield link time errors. All three clusters are affected. We are working to correct the module files. A workaround for users is to redefine the environment variable with the correct path; this requires some computational maturity. We recommend users contact oschelp@osc.edu for assistance. An example error from Cardinal with module intel-oneapi-mkl/2023.2.0 that defined environment variable MKL_LIBS_INT64 follows:

NCCL hang on Ascend dual-GPU nodes

Users may encounter the following message and experience NCCL hangs if the first operation is a barrier when running multi-GPU training. We have identified that this issue occurs only on a single Ascend Next Gen (dual-GPU) node where the GPUs are connected via the SMP interconnect across NUMA nodes, rather than through NVLink.

PyTorch hangs on dual-gpu node on Ascend

PyTorch can hang on Ascend on dual-GPU nodes

Through internal testing, we have confirmed that the hang issue only occurs on Ascend dual-GPU (nextgen) nodes. We’re still unsure why setting NCCL_P2P_DISABLE is necessary even for single-node setups. However, the performance impact should be minimal. This is because, on dual-GPU nodes, the GPUs are connected via the SMP interconnect across NUMA nodes. As there’s no NVLink on these nodes, GPU communication occurs through shared memory

Workaround

To get around this set the environment variable NCCL_P2P_DISABLE=1

Search form

Ascend

Testing Issue for Quantum Espresso 7.4.1 on Ascend

cp2k/2023.2 can produce huge output containing MKL messages

Some MKL environment variables have incorrect paths

NCCL hang on Ascend dual-GPU nodes

PyTorch hangs on dual-gpu node on Ascend

Workaround

OpenMPI 4 and NVHPC MPI Compatibility Issues with SLURM HWLOC

Handling full-node MPI warnings with MVAPICH 3.0

HWloc warning: Failed with: intersection without inclusion

Upcoming Events

Recent News

Translate

Ohio Department of Higher Education

State Government Links

Education Links

Search form

You are here

Ascend

Workaround

Upcoming Events

Recent News

Translate

State Government Links

Education Links