We are currently experiencing temporary instability on the Ascend login nodes.

Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort ascending Description Posted Updated
Backup failures on ess filesystem Backups, filesystem Resolved

The backups on the /fs/ess filesystem are having issues running. There has not been a successful backup of this filesystem since Sunday, 08 August 2021.

OSC is working with the vendor to... Read more

4 years 10 months ago 4 years 10 months ago
OSC Service Outage Notification Outage Resolved

We are currently experiencing outages affecting multiple services, including OnDemand (ondemand.osc.edu) and login nodes of HPC systems. Our team is actively investigating and working to resolve... Read more

12 months 1 day ago 12 months 16 hours ago
June 7th downtime to finish at 6:30PM Connectivity, filesystem, Infrastructure, login, Login Problems, Maintenance, Operations, Outage Resolved

Update: Downtime completed at 6:30PM, June 7th.

 

The June 7th downtime is now slated to be completed at 6:30PM.  Previous estimate was 5PM.

All systems and services will... Read more

10 years 3 days ago 10 years 14 hours ago
MPI_THREAD_MULTIPLE is not supported with OpenMPI-HPCX 4.x Owens, Software Resolved

A threading code with MPI where MPI_Init_thread uses MPI_THREAD_MULTIPLE will fail because UCX from HPCX package is built without enabling multi-threading. UCX is the... Read more

3 years 3 months ago 1 year 1 month ago
Poor network performance on some filesystems filesystem Resolved

We are experiencing some network performance issues on a cluster of servers involved with providing GPFS and some project filesystems. GPFS appears to be functioning acceptably, but proj01, proj02... Read more

12 years 10 months ago 12 years 10 months ago
vLLM versions prior to 0.14.1 are deprecated Software Resolved

vLLM versions prior to 0.14.1 are deprecated due to security issue CVE-2026-22778 which can allow remote code execution.  Clients are advised to use versions 0.... Read more

3 months 3 weeks ago 3 months 3 weeks ago
Rolling reboot of Owens cluster, starting from 8:30AM Oct 30, 2017 Batch, Owens Resolved

Updated on Nov 21, 2017 at 3:33PM:

It has been completed. 

Updated on October 20, 2017 at 4:19PM:

We will have a rolling reboot of Owens... Read more

8 years 7 months ago 8 years 6 months ago
Maintenance outage on the cluster export services Maintenance, OnDemand, Ruby Resolved

Update on 14 April 2020, 0903:

Work is completed.

Original message:

There will be maintenance on cluster export services on Tuesday, April... Read more

6 years 2 months ago 6 years 1 month ago
Rolling reboots on all HPC systems starting Oct 31 2024 Owens, Pitzer Resolved

Updates on Nov 13 2024:

Pitzer is completed. 

Updates... Read more

1 year 7 months ago 1 year 6 months ago
Scheduling suspended Batch Resolved

We have temporarily suspended scheduling due to some problems with the parallel scratch file system.

11 years 8 months ago 11 years 8 months ago

Pages