Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolutionsort ascending Description Posted Updated
Issue with submitting job array Batch, Owens Resolved

3:30 PM 5/10/2018 Original Post:

User may have been getting the following error message when trying to submit a PBS job using job arrays:

qsub: submit error (Maximum number of... Read more          
7 years 1 month ago 3 years 7 months ago
GPFS errors on compute nodes filesystem Resolved

We've seen an increase in transient problems that result in compute nodes losing access to the GPFS file systems for ~5 minutes.

Any jobs running on these nodes accessing files on GPFS may... Read more

4 years 7 months ago 3 years 7 months ago
Torque module on Oakley improperly setting environment variables Resolved

Intel library paths are being added to the environment variable LD_LIBRARY_PATH incorrectly when loading torque.  Additionally the Intel paths remain when the torque... Read more

10 years 4 months ago 7 years 1 month ago
PyTorch hangs on dual-gpu node on Ascend Ascend, GPU Resolved
(workaround)

PyTorch can hang on Ascend on dual-GPU nodes

Through internal testing, we have confirmed that the hang issue only occurs on Ascend dual-GPU (nextgen) nodes. We’re still unsure why... Read more

2 months 5 days ago 1 month 4 weeks ago
Negative Balance Emails client portal Resolved

Negative balance emails continue to be sent once an application is submitted.

To confirm whether or not you have truly submitted an application for additional resources and that you can... Read more

6 years 1 month ago 5 years 10 months ago
Missing shared library of some mvapich2 modules Owens, Pitzer Resolved

Updates on Feb 25 2022:

This issue is fixed. 

Original Post:

Users may see an issue of missing shared library with some mvapich2 modules... Read more

3 years 4 months ago 3 years 4 months ago
Performance Regression of GPU Nodes on Ruby GPU, Ruby Resolved

We currently have performance regression of Ruby's GPU nodes. Some of the GPU nodes on Ruby will remain in a power-saving state even after an application starts using them, resulting in... Read more

8 years 7 months ago 7 years 1 month ago
Ls-dyna license outage since Oct 15, 2019 Licensing Resolved

Updated on 1:47 PM Oct 16, 2019

The ls-dyna license server is operational again from 1:40 pm on Oct 16, 2019

Original Post:

The ls-dyna... Read more

5 years 8 months ago 5 years 8 months ago
Rolling reboot of Owens and Pitzer starting from February 3, 2020 login, Owens, Pitzer Resolved

2:11 PM 2/17/2020 Update:

The rolling reboot of Owens has been completed.

12:41 PM 2/10/2020 Update:

The rolling reboot of Pitzer has been... Read more

5 years 5 months ago 5 years 4 months ago
Emergency InfiniBand Shutdown (All systems) Network Resolved

We have returned to service. It appears that we have resolved the networking issues enough to allow jobs to run safely. We will continue working with our vendors to fix any remaining hardware... Read more

10 years 11 months ago 10 years 11 months ago

Pages