Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolution Description Posted Updated
Intermittent failure of default CPU binding Software Resolved
(workaround)

The default CPU binding for ORCA jobs can fail sporadically.  The failure is almost immediate and produces a cryptic error message, e.g.

$ORCA/orca h2o.in... Read more          
3 days 9 hours ago 3 days 9 hours ago
Multi-node job hang with ORCA 5 Owens, Pitzer, Software Resolved
(workaround)

You may experience a multi-node job hang if the job runs into a module that requires heavy I/O, e.g., MP2 or CCSD. Additionally, it potentially leads to our GPFS performance issue. We have... Read more

1 year 4 weeks ago 3 days 9 hours ago
OpenMPI-HPCX 4.1.x hangs on writing files on a shared file system Software Resolved
(workaround)

Your job utilizing openmpi/4.1.x-hpcx (or 4.1.x on Ascend) might hang while writing files on a shared file system. This issue is caused by a ... Read more

3 days 10 hours ago 3 days 10 hours ago
PyTorch hangs on dual-gpu node on Ascend Ascend, GPU Resolved
(workaround)

PyTorch can hang on Ascend on dual-GPU nodes

Through internal testing, we have confirmed that the hang issue only occurs on Ascend dual-GPU (nextgen) nodes. We’re still unsure why... Read more

1 week 2 days ago 3 days 10 hours ago
GCC 13 compilation errors due to missing headers Cardinal, Pitzer, Software Resolved
(workaround)

Users may encounter the following errors when compiling a C++ program with GCC 13:

error: 'uint64_t' in namespace 'std' does not name a type

or

error: 'std::... Read more          
6 months 3 weeks ago 1 week 6 hours ago
STAR-CCM+ MPI job failure and workaround Cardinal, Software Resolved
(workaround)

STAR-CCM+ encounters errors when running MPI jobs with Intel MPI or OpenMPI, displaying the following message:

ib_iface.c:1139 UCX ERROR Invalid active_speed on mlx5_0:1: 128

... Read more

6 months 3 weeks ago 1 week 6 hours ago
MPI_THREAD_MULTIPLE is not supported with OpenMPI-HPCX 4.x Owens, Software Resolved

A threading code with MPI where MPI_Init_thread uses MPI_THREAD_MULTIPLE will fail because UCX from HPCX package is built without enabling multi-threading. UCX is the... Read more

2 years 2 months ago 1 week 7 hours ago
HCOLL-related failures in OpenMPI applications Cardinal, Software Resolved
(workaround)

Several applications using OpenMPI, including HDF5, Boost, Rmpi, ORCA, and CP2K, may fail with errors such as

mca_coll_hcoll_module_enable() coll_hcol: mca_coll_hcoll_save_coll_handlers... Read more          
6 months 3 weeks ago 1 week 7 hours ago
Handling full-node MPI warnings with MVAPICH 3.0 Ascend, Cardinal Resolved
(workaround)

When running a full-node MPI job with MVAPICH 3.0 , you may encounter the following warning message:

[][mvp_generate_implicit_cpu_mapping] WARNING: You appear to be running at full... Read more          
6 months 1 week ago 1 week 7 hours ago
HWloc warning: Failed with: intersection without inclusion Ascend, Cardinal Resolved
(workaround)

When running MPI+OpenMP hybrid code with the Intel Classic Compiler and MVAPICH 3.0, you may encounter the following warning message from hwloc:

... Read more
6 months 1 week ago 1 week 7 hours ago

Pages