Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort ascending Description Posted Updated
GPFS hang Issue on 09/08/2016 filesystem Resolved

On Thursday, Sept 8 starting at 19:37, we had some bad interaction that appears to have been caused by the backup client, and the GPFS servers. This resulted in a GPFS hang that propagated I/O... Read more

8 years 9 months ago 8 years 9 months ago
PyTorch jobs timeout and hanging GPU Resolved

We have observed that many PyTorch users frequently encounter random timeouts, which result in the termination of their jobs but leave the process running on the node.... Read more

1 year 11 months ago 1 year 5 months ago
8AM 9/11/13 - Brief network disruption to reboot a switch Network Resolved

At 8AM on September 11, 2013, we will be rebooting a network switch to replace a failed card in the switch. Network will be disrupted for 10 to 15 minutes while the work is done. Filesystem mounts... Read more

11 years 9 months ago 11 years 9 months ago
Owens batch is down Owens Resolved

Updated at 9:07PM on Dec 20, 2017 :

Owens batch was restored by updating Torque resource manager at 6:37pm Dec 19, 2017. 

Original Post at 4:45PM on Dec 19... Read more

7 years 6 months ago 7 years 6 months ago
Spurious warnings about balance being exhausted client portal Resolved

Due to the price changes and some specifics about MyOSC, you may get warnings... Read more

4 years 12 months ago 4 years 11 months ago
Core label on OnDemand app is incorrect OnDemand Resolved

The core label on the OnDemand app incorrectly displays as '1', regardless of the requested number of cores for a job. While this label is incorrect, the job is still allocated the correct number... Read more

5 months 3 weeks ago 5 months 5 days ago
Can not change GPU compute mode on Oakley GPU Resolved

Update: The driver version has been updated and the issue has been fixed.

 

In updating the driver version for Oakley's NVIDIA GPUs the NVML libraries that are used in conjunction... Read more

10 years 7 months ago 10 years 5 months ago
qsub filter rejects valid jobs Resolved

Job scripts submitted on Glenn, Oakley, or Ruby all go a submit filter before reaching the resource manager, Torque.  A bug has been discovered in our submit filter which prevents jobs with the... Read more

10 years 3 months ago 9 years 9 months ago
Rolling reboot of all clusters, starting from 9:30 AM June 05, 2019 Batch, login, Owens, Pitzer, Ruby Resolved

Update #2 Posted on 14 June 2019 12:33 PM

The rolling reboots of all clusters are completed. Please contact oschelp@osc.edu if you... Read more

6 years 1 month ago 6 years 2 weeks ago
ondemand outage OnDemand Resolved

Resolution notes

The problems with ondemand.osc.edu are now resolved.

Users will encounter errors using... Read more

3 years 2 months ago 3 years 2 months ago

Pages