Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Title Category Resolution Description Posted Updatedsort ascending
Fail to connect with VS Code 1.86 Resolved

VS Code 1.86 (aka the ‘January 2024’ update) requires ≥glibc 2.28 which are not supported on Pitzer and Owens clusters. Please downgrade to VS Code 1.85. 

See this link for more... Read more

8 months 6 days ago 5 months 2 weeks ago
Backup Issues Backups Resolved

Updates on 04/17:

OSC has conducted thorough validations to ensure the integrity of our backup data for user home directories and the /fs/ess filesystem. 

... Read more

6 months 1 week ago 5 months 3 weeks ago
Multi-node job hang with ORCA 5 Owens, Pitzer, Software Resolved
(workaround)

You may experience a multi-node job hang if the job runs into a module that requires heavy I/O, e.g., MP2 or CCSD. Additionally, it potentially leads to our GPFS performance issue. We have... Read more

6 months 3 days ago 6 months 3 days ago
Slurm to be Upgraded to Version 23.11.4 Owens, Pitzer Resolved

Updates on 04/08/2024:

The rolling reboots are completed. 

Updates:

We will perform rolling reboots on this... Read more

7 months 1 week ago 6 months 5 days ago
Slurm on Pitzer is offline Resolved

The Slurm scheduler for Pitzer is currently offline. We are working with the vendor for the fix. Sorry for the inconvenience.

7 months 2 weeks ago 7 months 2 weeks ago
Slurm database repair on 01/25/2024 Outage Resolved

We have scheduled a Slurm database repair, which is planned to start at 8:30 am US/Eastern on Thursday, January 25, 2024. During the repair, Slurm database will be offline; running jobs and... Read more

8 months 3 weeks ago 8 months 2 weeks ago
PyTorch jobs timeout and hanging GPU Resolved

We have observed that many PyTorch users frequently encounter random timeouts, which result in the termination of their jobs but leave the process running on the node.... Read more

1 year 3 months ago 9 months 1 week ago
Rolling reboot of Ascend, Owens and Pitzer starting from Oct 25 2023 Owens, Pitzer Resolved

Update on Nov 8 2023:

Rolling reboots of all clusters are completed. 

Update on Nov 3 2023:

Rolling reboots of Ascend and Pitzer clusters... Read more

11 months 3 weeks ago 11 months 1 week ago
Running jobs requeued on all clusters Owens, Pitzer Resolved

The Slurm upgrades during rolling reboots of Ascend, Owens and Pitzer we performed today (Oct 25 2023) cause all running jobs on the systems requeued around 8:45am. You will not be billed for the... Read more

11 months 3 weeks ago 11 months 2 weeks ago
Emergency UPS Maintenance Maintenance Resolved

A UPS in the data center requires some emergency maintenance to be undertaken at 2PM on Oct 11 2023. There is a very small chance that parts of Owens and of the C18 Pitzer nodes may lose power as... Read more

1 year 45 min ago 12 months 2 hours ago

Pages