Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolution Description Postedsort ascending Updated
PyTorch hangs on dual-gpu node on Ascend Ascend, GPU Resolved
(workaround)

PyTorch can hang on Ascend on dual-GPU nodes

Through internal testing, we have confirmed that the hang issue only occurs on Ascend dual-GPU (nextgen) nodes. We’re still unsure why... Read more

2 months 4 days ago 1 month 4 weeks ago
Ascend desktop including lightweight is not working Resolved

Update: this is fixed. 

Original Post:

Ascend Desktop, including... Read more

3 months 3 days ago 3 months 2 days ago
OpenMPI 4 and NVHPC MPI Compatibility Issues with SLURM HWLOC Ascend, Cardinal, Software Resolved
(workaround)

A pure MPI application using mpirun or mpiexec with more ranks than the number of NUMA nodes may encounter an error similar to the following:... Read more

3 months 2 weeks ago 2 weeks 1 day ago
Upcoming Expiration of Intel Compiler Licenses on Pitzer and State-wide Licensing Resolved

Old Intel compiler licenses on Pitzer and for state-wide access with versions 19.1.3 and earlier will no longer be available from March 31, 2025. We are currently... Read more

4 months 8 hours ago 3 months 3 days ago
BWA 0.7.17 vulnerability Cardinal Resolved
(workaround)

Cardinal hosted a version of bwa that had an unpatched vulnerability, 0.7.17. 

This version has been removed from Cardinal in favor of 0.7.18

4 months 1 week ago 1 month 2 weeks ago
LS-DYNA mpp-dyna Cardinal: Remote access error on mlx5_0:1, RDMA_READ Cardinal, Software Resolved
(workaround)

You may encounter the following error while running mpp-dyna jobs with multiple nodes:

[c0054:22206:0:22206] ib_mlx5_log.c:179  Remote access error on mlx5_0:1/IB (synd 0x13 vend 0x88... Read more          
5 months 1 week ago 2 weeks 1 day ago
Ansys OMP: System error #22: Invalid argument Cardinal Resolved
(workaround)

You may encounter the following error while running Ansys on Cardinal:

OMP: Error #100: Fatal system error detected.
OMP: System error #22: Invalid argument
forrtl: error (76): Abort... Read more          
5 months 3 weeks ago 1 month 2 weeks ago
Core and Node labels on Classroom app are incorrect Resolved

The core and node labels on the Classroom app (class.osc.edu) incorrectly displays as '0', regardless of the requested number of cores for a job. While this label is incorrect, the job is still... Read more

5 months 3 weeks ago 5 months 5 days ago
Core label on OnDemand app is incorrect OnDemand Resolved

The core label on the OnDemand app incorrectly displays as '1', regardless of the requested number of cores for a job. While this label is incorrect, the job is still allocated the correct number... Read more

5 months 3 weeks ago 5 months 5 days ago
- --gpus-per-task is not working Batch Resolved

Updated: This is fixed. 

Original Post:

After the recent Slurm upgrade, the option --gpus-per-task is currently not functioning as... Read more

5 months 3 weeks ago 5 months 3 weeks ago

Pages