We will have rolling reboots of all HPC clusters (Ascend, Cardinal, Owens, and Pitzer cluster), including login and compute nodes, starti

Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Titlesort ascending Category Resolution Description Posted Updated
Slurm to be Upgraded to Version 23.11.4 Owens, Pitzer Resolved

Updates on 04/08/2024:

The rolling reboots are completed. 

Updates:

We will perform rolling reboots on this... Read more

8 months 1 week ago 7 months 2 days ago
Slurm on Pitzer is offline Resolved

The Slurm scheduler for Pitzer is currently offline. We are working with the vendor for the fix. Sorry for the inconvenience.

8 months 2 weeks ago 8 months 2 weeks ago
Slurm database repair on 01/25/2024 Outage Resolved

We have scheduled a Slurm database repair, which is planned to start at 8:30 am US/Eastern on Thursday, January 25, 2024. During the repair, Slurm database will be offline; running jobs and... Read more

9 months 2 weeks ago 9 months 1 week ago
Slow Processing of Password Changes Account Management, client portal Resolved

Password changes are taking longer than usual to process through the system. In some test cases up to 17 minutes.

We are working on resolving this issue.

5 years 4 months ago 2 years 11 months ago
Singularity: reached your pull rate limit Owens, Pitzer, Software Resolved
(workaround)

You might encounter an error while pulling a large Docker image:

ERROR: toomanyrequests: Too Many Requests.

or

You have reached your pull rate limit. You may... Read more          
3 years 4 months ago 2 years 7 months ago
Sign Up reCAPTCHA Error Resolved

If you fail to hit the reCAPTCHA and try to submit the form, you will receive an error regarding the reCAPTCHA.

If you hit the reCAPTCHA and re-submit, the error will remain.

... Read more

5 years 3 months ago 5 years 3 days ago
Segmentation fault from openmpi/1.10-hpcx and 2.0-hpcx on Owens Owens, Software Resolved

We have found that recent MPI jobs using openmpi/1.10-hpcx and openmpi/2.0-hpcx on Owens may complete or hang until the job is killed, but receive segmentation fault. Some applications might be ... Read more

5 years 3 months ago 5 years 2 months ago
Security vulnerabilities on ARM Forge versions prior to 22.0.x Software Resolved
(workaround)

ARM identified security vulnerabilities on ARM Forge versions prior to 22.0.x as follow:

  • Security update #1: A locally exploitable code-injection vulnerability was identified in... Read more
2 years 4 months ago 2 years 4 months ago
Security Vulnerability for GPFS filesystem Resolved

Update: The fix was deployed during May 19 Downtime. 

Clients are not able to use mm* commands to manipulate GPFS ACLs on most OSC systems, due to a security vulnerability... Read more

4 years 6 months ago 4 years 5 months ago
Scratch filesystem is down filesystem, OnDemand Resolved

Updated on 2:30pm Feb 1st:

Scratch filesystem is back. OnDemand is also available now. 

Original Post:

Scratch filesystem is down now.... Read more

5 years 9 months ago 5 years 9 months ago

Pages