Owens

Owens batch is down

Updated at 9:07PM on Dec 20, 2017 :

Owens batch was restored by updating Torque resource manager at 6:37pm Dec 19, 2017. 

Original Post at 4:45PM on Dec 19, 2017:

Owens batch has been down since approximately 4pm Dec 19, 2017 with returning the following message:

Issue with GPFS on Owens since April 14, 2017

3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address this GPFS issue. 

We have had issues with GPFS mounts on Owens Cluster since Friday afternoon, April 14, 2017. The affected nodes have been marked offline to be restarted or rebooted to fix this issue. Jobs may have been negatively impacted by this issue since April 14. If you experience any 'stale file handle' or file not found errors, please let us know.

Rolling reboot of all clusters, starting from Wednesday morning, April 19, 2017

1:40PM 4/27/2017 Update: Rolling reboots are completed. 

3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured late Friday. 

Rolling reboot of Owens, Oakley, and Ruby clusters is scheduled to start from Wednesday morning, April 19, 2017. Highlights of the rolling reboot activities:

Owens is in Partial Service

3:45PM April 3, 2017 Update: GPU nodes on Owens are available. 

206 Owens nodes are not accessible to users due to GPU testing and a bad Ethernet switch. It is expected that 48 nodes with switch problem will be availabe by Friday, March 31 and the rest for GPU testing will be available on Monday, April 3, 2017. 

We apologize for the inconvenience this may cause you. Please contact oschelp@osc.edu if you have any questions. 

Pages

Subscribe to Owens