We are currently experiencing outages affecting multiple services, including OnDemand (ondemand.osc.edu) and login nodes of HPC systems.

Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort descending Description Posted Updated
All HPC systems are available Login Problems, Operations, Outage Resolved

8/24/16 3:57PM: All HPC systems are availalbe including:

  • Oakley cluster for general access
  • Ruby cluster for restricted access
  • Owens cluster for... Read more
8 years 9 months ago 8 years 9 months ago
Large MPI job startup hang with mvapich2/2.3 and mvapich2/2.3.1 Owens, Pitzer, Software Resolved
(workaround)

We have found that large MPI jobs may hang at startup with mvapich2/2.3 and mvapich/2.3.1 (on any compiler dependency) due to a known bug that has been fixed in release 2... Read more

5 years 7 months ago 1 month 12 hours ago
Brief disruption of GPFS on 8/28/2013 filesystem Resolved

On the morning August 28th, 2013 we will briefly disrupt the GPFS filesystem to reboot servers. This is necessary to upgrade the GPFS system. The in-place upgrade should only briefly interrupt... Read more

11 years 9 months ago 11 years 9 months ago
OSC OnDemand is not responsive OnDemand Resolved

OSC OnDemand is not responsive now. We are investigating the problem now. Please use other ways like ssh to connect to OSC HPC systems. 

We apologize for any inconvenience this may cause... Read more

5 years 3 months ago 5 years 3 months ago
Oakley login node down login Resolved

One of the Oakley login nodes is down. We are currently working on bringing it back online. SSH connections to oakley.osc.edu may time out. A workaround is to connect directly to oakley01.osc.edu... Read more

11 years 3 months ago 11 years 3 months ago
STAR-CCM+ MPI job failure and workaround Cardinal, Software Resolved
(workaround)

STAR-CCM+ encounters errors when running MPI jobs with Intel MPI or OpenMPI, displaying the following message:

ib_iface.c:1139 UCX ERROR Invalid active_speed on mlx5_0:1: 128

... Read more

7 months 4 weeks ago 1 month 1 week ago
Intermittent issue with connecting to batch server Batch, Owens Resolved

Updated on June 18, 2018, at 3:15 PM:

This issue has been fixed. 

Posted on June 18, 2018, at 12:30 PM:

We've been having intermittent... Read more

6 years 12 months ago 6 years 12 months ago
CP2K 6.1 would fail on Pitzer Cascade Lakes (48-core) node: Pitzer Resolved
(workaround)

CP2K 6.1 would fail with the following error when running on Pitzer Cascade Lakes (48-core) node:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic... Read more          
3 years 11 months ago 1 month 12 hours ago
Armstrong inaccessible Resolved

Update: 2PM March 12th: Armstrong is back up and running.  Please notify oschelp@osc.edu of any lingering issues.


As of 10AM Thursday March 12th... Read more

10 years 3 months ago 10 years 3 months ago
NCCL hang on Ascend dual-GPU nodes Ascend, GPU, Software Resolved
(workaround)

Users may encounter the following message and experience NCCL hangs if the first operation is a barrier when running multi-GPU training. We have identified... Read more

2 weeks 2 days ago 1 week 3 days ago

Pages