Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolution Description Posted Updated
PyTorch hangs on dual-gpu node on Ascend Ascend, GPU Resolved
(workaround)

PyTorch can hang on Ascend on dual-GPU nodes

Through internal testing, we have confirmed that the hang issue only occurs on Ascend dual-GPU (nextgen) nodes. We’re still unsure why... Read more

3 hours 46 min ago 3 hours 44 min ago
OpenMPI 4 and NVHPC MPI Compatibility Issues with SLURM HWLOC Ascend, Cardinal, Software Resolved
(workaround)

A pure MPI application using mpirun or mpiexec with more ranks than the number of NUMA nodes may encounter an error similar to the following:... Read more

1 month 1 week ago 10 hours 13 min ago
Ascend desktop including lightweight is not working Resolved

Update: this is fixed. 

Original Post:

Ascend Desktop, including... Read more

4 weeks 1 day ago 4 weeks 6 hours ago
Upcoming Expiration of Intel Compiler Licenses on Pitzer and State-wide Licensing Resolved

Old Intel compiler licenses on Pitzer and for state-wide access with versions 19.1.3 and earlier will no longer be available from March 31, 2025. We are currently... Read more

1 month 3 weeks ago 4 weeks 1 day ago
BWA 0.7.17 vulnerability Cardinal Resolved
(workaround)

Cardinal hosted a version of bwa that had an unpatched vulnerability, 0.7.17. 

This version has been removed from Cardinal in favor of 0.7.18

2 months 4 days ago 2 months 4 days ago
Core label on OnDemand app is incorrect OnDemand Resolved

The core label on the OnDemand app incorrectly displays as '1', regardless of the requested number of cores for a job. While this label is incorrect, the job is still allocated the correct number... Read more

3 months 3 weeks ago 3 months 1 day ago
Core and Node labels on Classroom app are incorrect Resolved

The core and node labels on the Classroom app (class.osc.edu) incorrectly displays as '0', regardless of the requested number of cores for a job. While this label is incorrect, the job is still... Read more

3 months 2 weeks ago 3 months 1 day ago
LS-DYNA mpp-dyna Cardinal: Remote access error on mlx5_0:1, RDMA_READ Cardinal, Software Unresolved

You may encounter the following error while running mpp-dyna jobs with multiple nodes:

[c0054:22206:0:22206] ib_mlx5_log.c:179  Remote access error on mlx5_0:1/IB (synd 0x13 vend 0x88... Read more          
3 months 1 week ago 3 months 1 week ago
Abaqus Parallel Job Failure with PMPI Due to Out-of-Memory (OOM) Error Cardinal Resolved
(workaround)

You may encounter the following error while running an Abaqus parallel job with PMPI:

Traceback (most recent call last):
 File "SMAPylModules/SMAPylDriverPy.m/src/driverAnalysis.py",... Read more          
4 months 6 days ago 3 months 1 week ago
Ansys OMP: System error #22: Invalid argument Cardinal Resolved
(workaround)

You may encounter the following error while running Ansys on Cardinal:

OMP: Error #100: Fatal system error detected.
OMP: System error #22: Invalid argument
forrtl: error (76): Abort... Read more          
3 months 2 weeks ago 3 months 2 weeks ago

Pages