Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort ascending Description Posted Updated
Scheduling suspended Batch Resolved

We have temporarily suspended scheduling due to some problems with the parallel scratch file system.

9 years 10 months ago 9 years 10 months ago
NFS outage on Thursday Jan 17 from 7am to 8am filesystem Resolved

Update:

This work has been canceled and will be done during downtime on Feb. 5. 

Original Post:

On Thursday, January 17th from 7 am to 8 am OSC will have a GPFS... Read more

5 years 6 months ago 5 years 6 months ago
PyTorch jobs timeout and hanging GPU Resolved

We have observed that many PyTorch users frequently encounter random timeouts, which result in the termination of their jobs but leave the process running on the node.... Read more

1 year 2 weeks ago 6 months 3 weeks ago
June 7th downtime to finish at 6:30PM Connectivity, filesystem, Infrastructure, login, Login Problems, Maintenance, Operations, Outage Resolved

Update: Downtime completed at 6:30PM, June 7th.

 

The June 7th downtime is now slated to be completed at 6:30PM.  Previous estimate was 5PM.

All systems and services will... Read more

8 years 1 month ago 8 years 1 month ago
Spurious warnings about balance being exhausted client portal Resolved

Due to the price changes and some specifics about MyOSC, you may get warnings... Read more

4 years 3 weeks ago 3 years 11 months ago
Poor network performance on some filesystems filesystem Resolved

We are experiencing some network performance issues on a cluster of servers involved with providing GPFS and some project filesystems. GPFS appears to be functioning acceptably, but proj01, proj02... Read more

11 years 6 days ago 11 years 5 days ago
Rolling reboot of Pitzer cluster, starting from Feb 03, 2021 Batch, login, Pitzer Resolved

Updates at 10AM Feb 11, 2021:

The rolling reboot is completed. 

Original Post:

We will have rolling reboots of Pitzer cluster including... Read more

3 years 5 months ago 3 years 5 months ago
February 11 2014 Scheduled Downtime Outage Resolved

HPC systems are offline today for scheduled quarterly maintenance activity. For details, please visit osc.edu/n

10 years 5 months ago 10 years 5 months ago
Job failures on some rolling-rebooted nodes on Owens since April 16, 2018 Owens Resolved

3:35 PM 4/30/2018 Update:

The cause is that NFSv4.1 is not configured correctly after OS on Owens was updated from RHEL 7.3 to 7.4. We re-rebooted the Owens compute nodes... Read more

6 years 3 months ago 6 years 2 months ago
ondemand outage OnDemand Resolved

Resolution notes

The problems with ondemand.osc.edu are now resolved.

Users will encounter errors using... Read more

2 years 3 months ago 2 years 3 months ago

Pages