We are currently experiencing temporary instability on the Ascend login nodes.

Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort descending Description Posted Updated
Scratch and Project are hung; schedulings have been paused Batch, filesystem Resolved

1:00PM 4/6/2017 Update:  The Scratch and Project file systems are back to normal service. Scheduling on systems are resumed. We are still investigating the causes to this problem... Read more

9 years 2 months ago 9 years 2 months ago
Rolling reboot of Oakley and Ruby clusters, starting from 8:30AM October 9, 2017 Batch, login, Ruby Resolved

Updates on 1:00PM October 16, 2017: 

The rolling reboots of Oakley and Ruby are completed. 

... Read more
8 years 8 months ago 8 years 7 months ago
Ondemand error when terminating interactive app OnDemand Resolved
(workaround)

When trying to delete an interactive session through OnDemand, you may receive an error page about 'No such file'. This can be disregarded. Simply navigate back to the interactive sessions page... Read more

5 years 6 months ago 4 years 1 month ago
OpenMPI-HPCX 4.1.x hangs on writing files on a shared file system Software Resolved
(workaround)

Your job utilizing openmpi/4.1.x-hpcx (or 4.1.x on Ascend) might hang while writing files on a shared file system. This issue is caused by a ... Read more

1 year 1 month ago 1 year 1 month ago
Oakley login node instability Operations Resolved

Oakley login nodes are seeing some instability related to Lustre. We will reboot the nodes on Thursday, October 2nd 2014 to resolve the issue. If a login node crashes before then and we have the... Read more

11 years 8 months ago 11 years 7 months ago
Rolling reboots of Owens and Pitzer, starting from Tuesday, Jan 22, 2019 Batch, login, Owens Resolved

... Read more

7 years 4 months ago 7 years 4 months ago
starccm/15.02.007 with intelmpi after Mar 22, 2022 Resolved
(workaround)

STAR-CCM+ 15.02.007 and 15.02.007-mixed with intelMPI would fail on multiple node jobs after the downtime on Mar 22, 2022. Please use openmpi instead. You can find more... Read more

4 years 2 months ago 1 year 4 weeks ago
Globus Online Transfers Failing Connectivity, filesystem, Web Services Resolved

We are currently investigating multiple reports of Globus Online transfers to/from OSC to other sites are failing.  Transfers to/from Globus Personal Endpoints do not seem to be affected.

... Read more

10 years 2 months ago 8 years 1 week ago
GPFS filesystem Problem on Oct 24 2019 filesystem Resolved

Updated on 4:45 PM Oct 24, 2019

The issue is fixed. GPFS filesystems and OnDemand are back. 

Original Post

We are having issues with GPFS filesystem... Read more

6 years 7 months ago 6 years 7 months ago
MVAPICH 3.0 hang due to PMI mismatch with Slurm Software Resolved
(workaround)

Applications such as Quantum ESPRESSO, LAMMPS, and NWChem experienced hangs with MVAPICH 3.0 due to a PMI mismatch. MVAPICH 3.0 was built with PMI-1, while newer Slurm versions on RHEL 9... Read more

8 months 6 days ago 4 weeks 1 day ago

Pages