Known issues

Unresolved known issues

Known issue with an Unresolved Resolution state is an active problem under investigation; a temporary workaround may be available.

Resolved known issues

A known issue with a Resolved (workaround) Resolution state is an ongoing problem; a permanent workaround is available which may include using different software or hardware.

A known issue with Resolved Resolution state has been corrected.

Known Issues

Titlesort descending Category Resolution Description Posted Updated
PyTorch jobs timeout and hanging GPU Resolved

We have observed that many PyTorch users frequently encounter random timeouts, which result in the termination of their jobs but leave the process running on the node.... Read more

1 year 5 months ago 11 months 1 week ago
qsub filter rejects valid jobs Resolved

Job scripts submitted on Glenn, Oakley, or Ruby all go a submit filter before reaching the resource manager, Torque.  A bug has been discovered in our submit filter which prevents jobs with the... Read more

9 years 8 months ago 9 years 2 months ago
quota exceeded error when using chgrp in /fs/ess directories filesystem Resolved

Users may receive an error when using the chgrp command on data in /fs/ess/ locations.

$ chgrp -v PEX1234 my-file.txt
chgrp: changing group of 'my-file.txt': Disk quota exceeded
failed... Read more          
1 year 10 months ago 1 year 9 months ago
Replacement of Owens Ethernet switches from Dec 14, 2018 Network, Owens Resolved

Updated on Jan 16, 2019, at 09:20 AM:

The replacement is done except for the three switches including the login nodes of Owens. We posted another notice for more... Read more

6 years 3 months ago 5 years 11 months ago
Rolling reboot of all clusters, starting from 8 AM Tuesday, June 19, 2018 Batch, Owens, Ruby Resolved

Posted on June 12, 2018, at 4:40 PM:

We will have rolling reboots of three clusters (Owens, Ruby, and Oakley) including login and compute nodes, starting from 8 AM Tuesday... Read more

6 years 6 months ago 6 years 5 months ago
Rolling reboot of all clusters, starting from 9:30 AM June 05, 2019 Batch, login, Owens, Pitzer, Ruby Resolved

Update #2 Posted on 14 June 2019 12:33 PM

The rolling reboots of all clusters are completed. Please contact oschelp@osc.edu if you... Read more

5 years 6 months ago 5 years 6 months ago
Rolling reboot of all clusters, starting from Wednesday morning, April 19, 2017 Batch, Maintenance, Owens, Ruby Resolved

1:40PM 4/27/2017 Update: Rolling reboots are completed. 

3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured... Read more

7 years 8 months ago 7 years 7 months ago
Rolling reboot of Ascend, Owens and Pitzer starting from Oct 25 2023 Owens, Pitzer Resolved

Update on Nov 8 2023:

Rolling reboots of all clusters are completed. 

Update on Nov 3 2023:

Rolling reboots of Ascend and Pitzer clusters... Read more

1 year 1 month ago 1 year 1 month ago
Rolling reboot of compute and login nodes of all clusters, starting from Wednesday morning, March 22, 2017 login, Owens, Ruby Resolved

4:56PM 3/28/2017 Update: The rolling reboots of all systems are completed. 

All compute nodes and login nodes of Owens, Oakley, and Ruby clusters will need to be rebooted... Read more

7 years 9 months ago 7 years 8 months ago
Rolling reboot of login nodes of clusters at 7:00AM Dec 19, 2017 login Resolved

We will have rolling reboot of login nodes of clusters at 7:00AM Dec 19, 2017 for GPFS version upgrade. It is supposed to be completed in a short period of time. f you encounter any login issues,... Read more

6 years 12 months ago 6 years 11 months ago

Pages