Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort ascending Description Posted Updated
PyTorch jobs timeout and hanging GPU Resolved

We have observed that many PyTorch users frequently encounter random timeouts, which result in the termination of their jobs but leave the process running on the node.... Read more

1 year 12 months ago 1 year 6 months ago
Negative Balance Emails client portal Resolved

Negative balance emails continue to be sent once an application is submitted.

To confirm whether or not you have truly submitted an application for additional resources and that you can... Read more

6 years 2 months ago 5 years 10 months ago
Spurious warnings about balance being exhausted client portal Resolved

Due to the price changes and some specifics about MyOSC, you may get warnings... Read more

5 years 2 days ago 4 years 11 months ago
Performance Regression of GPU Nodes on Ruby GPU, Ruby Resolved

We currently have performance regression of Ruby's GPU nodes. Some of the GPU nodes on Ruby will remain in a power-saving state even after an application starts using them, resulting in... Read more

8 years 7 months ago 7 years 1 month ago
Core label on OnDemand app is incorrect OnDemand Resolved

The core label on the OnDemand app incorrectly displays as '1', regardless of the requested number of cores for a job. While this label is incorrect, the job is still allocated the correct number... Read more

6 months 11 hours ago 5 months 1 week ago
2/13/2014 0730 - Reboot of login nodes Outage Resolved

We need to reboot all of the login nodes on our production clusters to fix a minor issue from the downtime. We will be conducting this reboot at 7:30AM on Thursday, February 13th 2014. We expect... Read more

11 years 4 months ago 11 years 4 months ago
Emergency InfiniBand Shutdown (All systems) Network Resolved

We have returned to service. It appears that we have resolved the networking issues enough to allow jobs to run safely. We will continue working with our vendors to fix any remaining hardware... Read more

10 years 11 months ago 10 years 11 months ago
ondemand outage OnDemand Resolved

Resolution notes

The problems with ondemand.osc.edu are now resolved.

Users will encounter errors using... Read more

3 years 2 months ago 3 years 2 months ago
LS-DYNA License problems on all clusters Software Resolved

We are experiencing problems with the LS-DYNA license server. We will resume working on restoring access to this software on October 24th, 2018.

==

It has been fixed and working... Read more

6 years 8 months ago 6 years 8 months ago
Large MPI job startup hang with mvapich2/2.3 and mvapich2/2.3.1 Owens, Pitzer, Software Resolved
(workaround)

We have found that large MPI jobs may hang at startup with mvapich2/2.3 and mvapich/2.3.1 (on any compiler dependency) due to a known bug that has been fixed in release 2... Read more

5 years 7 months ago 1 month 3 weeks ago

Pages