|
Rolling Reboot of all three HPC clusters beginning July 7 |
|
Unresolved |
A rolling reboot is scheduled to remediate a ptrace vulnerability and safely restore debugger functionality for all clusters, including Ascend, Cardinal, and Pitzer... Read more |
2 days 5 hours ago |
2 days 5 hours ago |
|
ptrace Disabled so Debuggers and Tracers Do Not Work |
|
Unresolved |
ptrace has been disabled globally across all OSC systems to mitigate a newly identified Linux kernel vulnerability. This disables most functionality of debuggers,... Read more |
1 month 2 weeks ago |
6 days 12 hours ago |
|
cuMemHostRegister Fails with CUDA_ERROR_INVALID_VALUE on RHEL 9.6 |
Ascend, Cardinal, GPU, system software |
Resolved |
After upgrading the operating system to RHEL 9.6 during the scheduled downtime on May 12, 2026, applications utilizing UCX (... Read more |
1 month 5 days ago |
1 week 1 day ago |
|
STAR-CCM+ OpenMPI Job Failed due to Out-of-Memory |
Cardinal, Software |
Resolved (workaround) |
After the scheduled downtime on May 12, 2026, STAR-CCM+ has been encountering out-of-memory errors when running OpenMPI jobs. A message similar to the following... Read more |
1 month 5 days ago |
1 week 2 days ago |
|
STAR-CCM+ MPI job failure |
Cardinal, Software |
Resolved (workaround) |
STAR-CCM+ encounters errors when running MPI jobs using Intel MPI or Open MPI, displaying the following message:
ib_iface.c:1139 UCX ERROR Invalid active_speed on... Read more |
1 year 8 months ago |
1 week 3 days ago |
|
Rolling Reboot for Security Fix |
|
Resolved |
A rolling reboot is in progress to address CVE-2026-23111 (nf_tables logic bug) for all clusters, including Ascend, Cardinal, and Pitzer. Login nodes will be rebooted first and access... Read more |
3 weeks 4 hours ago |
1 week 3 days ago |
|
Temporary Login Node Instability on Ascend |
Ascend |
Unresolved |
We are currently experiencing temporary instability on the Ascend login nodes, which may result in slow response times or unexpected session disconnects. Our team is actively... Read more |
1 month 4 days ago |
2 weeks 1 day ago |
|
Nsight GPU profiler not working due to DCGM conflict |
GPU, Infrastructure |
Resolved |
UPDATE (Mar 15, 2023)
After the downtime on Mar. 14, 2023, OSC enabled a new Slurm option --gres=nsight. DCGM will be disabled on the nodes for the... Read more |
3 years 3 months ago |
1 month 1 week ago |
|
MATLAB (legacy) r2024a fails to launch on OnDemand Desktop |
Software |
Resolved (workaround) |
As of the downtime on 05/12/26 MATLAB (legacy) has stopped working when users try to open .m files in the GUI application for version r2024a. At the time we suggested to please use the... Read more |
1 month 2 weeks ago |
1 month 2 weeks ago |
|
MVAPICH 3.0 hang due to PMI mismatch with Slurm |
Software |
Resolved (workaround) |
Applications such as Quantum ESPRESSO, LAMMPS, and NWChem experienced hangs with MVAPICH 3.0 due to a PMI mismatch. MVAPICH 3.0 was built with PMI-1, while newer Slurm versions on RHEL 9... Read more |
8 months 4 weeks ago |
1 month 3 weeks ago |