Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort descending Description Posted Updated
Issue with submitting job array Batch, Owens Resolved

3:30 PM 5/10/2018 Original Post:

User may have been getting the following error message when trying to submit a PBS job using job arrays:

qsub: submit error (Maximum number of... Read more          
7 years 2 months ago 3 years 7 months ago
Torque module on Oakley improperly setting environment variables Resolved

Intel library paths are being added to the environment variable LD_LIBRARY_PATH incorrectly when loading torque.  Additionally the Intel paths remain when the torque... Read more

10 years 4 months ago 7 years 1 month ago
PyTorch jobs timeout and hanging GPU Resolved

We have observed that many PyTorch users frequently encounter random timeouts, which result in the termination of their jobs but leave the process running on the node.... Read more

2 years 12 hours ago 1 year 6 months ago
Negative Balance Emails client portal Resolved

Negative balance emails continue to be sent once an application is submitted.

To confirm whether or not you have truly submitted an application for additional resources and that you can... Read more

6 years 2 months ago 5 years 10 months ago
Spurious warnings about balance being exhausted client portal Resolved

Due to the price changes and some specifics about MyOSC, you may get warnings... Read more

5 years 6 days ago 4 years 11 months ago
Performance Regression of GPU Nodes on Ruby GPU, Ruby Resolved

We currently have performance regression of Ruby's GPU nodes. Some of the GPU nodes on Ruby will remain in a power-saving state even after an application starts using them, resulting in... Read more

8 years 7 months ago 7 years 1 month ago
Core label on OnDemand app is incorrect OnDemand Resolved

The core label on the OnDemand app incorrectly displays as '1', regardless of the requested number of cores for a job. While this label is incorrect, the job is still allocated the correct number... Read more

6 months 4 days ago 5 months 2 weeks ago
2/13/2014 0730 - Reboot of login nodes Outage Resolved

We need to reboot all of the login nodes on our production clusters to fix a minor issue from the downtime. We will be conducting this reboot at 7:30AM on Thursday, February 13th 2014. We expect... Read more

11 years 5 months ago 11 years 5 months ago
Emergency InfiniBand Shutdown (All systems) Network Resolved

We have returned to service. It appears that we have resolved the networking issues enough to allow jobs to run safely. We will continue working with our vendors to fix any remaining hardware... Read more

10 years 11 months ago 10 years 11 months ago
ondemand outage OnDemand Resolved

Resolution notes

The problems with ondemand.osc.edu are now resolved.

Users will encounter errors using... Read more

3 years 2 months ago 3 years 2 months ago

Pages