Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort descending Description Posted Updated
NCCL hang on Ascend dual-GPU nodes Ascend, GPU, Software Resolved
(workaround)

Users may encounter the following message and experience NCCL hangs if the first operation is a barrier when running multi-GPU training. We have identified... Read more

7 months 3 weeks ago 7 months 2 weeks ago
Resolved: Home directory space Issue with MATLAB 2024a Software Resolved

Users may experience their home directory running out of space after executing multiple MATLAB 2024a jobs. This issue is caused by the accumulation of multiple copies of the MathWorks Service... Read more

7 months 2 weeks ago 7 months 2 weeks ago
MPI-IO issues on home directories with Intel MPI 2019.3 Pitzer, Software Resolved
(workaround)

Certain MPI-IO operations with intelmpi/2019.3 may crash, fail or proceed with errors on the home directory. We do not expect the same issue on our GPFS file system, such as the... Read more

8 months 5 days ago 8 months 5 days ago
Using mpiexec/mpirun with Intel MPI on Slurm Software Resolved
(workaround)

Intel MPI on Slurm batch system is configured to support PMI process manager. It is recommended to use srun as MPI program launcher. If you prefer using mpiexec/... Read more

8 months 5 days ago 8 months 5 days ago
OpenMPI 4 and NVHPC MPI Compatibility Issues with SLURM HWLOC Ascend, Cardinal, Software Resolved
(workaround)

A pure MPI application using mpirun or mpiexec with more ranks than the number of NUMA nodes may encounter an error similar to the following:... Read more

10 months 1 day ago 7 months 7 hours ago
Ascend desktop including lightweight is not working Resolved

Update: this is fixed. 

Original Post:

Ascend Desktop, including... Read more

9 months 2 weeks ago 9 months 2 weeks ago
PyTorch hangs on dual-gpu node on Ascend Ascend, GPU Resolved
(workaround)

PyTorch can hang on Ascend on dual-GPU nodes

Through internal testing, we have confirmed that the hang issue only occurs on Ascend dual-GPU (nextgen) nodes. We’re still unsure why... Read more

8 months 2 weeks ago 8 months 1 week ago
OpenMPI-HPCX 4.1.x hangs on writing files on a shared file system Software Resolved
(workaround)

Your job utilizing openmpi/4.1.x-hpcx (or 4.1.x on Ascend) might hang while writing files on a shared file system. This issue is caused by a ... Read more

8 months 1 week ago 8 months 1 week ago
Intermittent failure of default CPU binding Software Resolved
(workaround)

The default CPU binding for ORCA jobs can fail sporadically.  The failure is almost immediate and produces a cryptic error message, e.g.

$ORCA/orca h2o.in... Read more          
8 months 1 week ago 8 months 1 week ago
OSC will remove Jupyter MATLAB Kernel Cardinal, OnDemand Resolved

OSC will remove the default MATLAB Jupyter Kernel on Tuesday, May 20th, 2025. To create your own Jupyter MATLAB Kernel please follow the documentation on the MATLAB Page... Read more

8 months 6 days ago 8 months 6 days ago

Pages