There have been problems with PBS torque commands including
qstat from the Ruby login nodes since this morning.
Rolling reboots of all three clusters, starting from Tuesday, September 4, 2018
We will have rolling reboots of Owens and Ruby clusters starting from 8 AM Monday, August 6, 2018.
Rolling reboots of all clusters, starting from 8 AM Tuesday, June 19, 2018
We will have rolling reboots of Oakley, Ruby and Owens clusters starting from Monday Feb 5, 2018.
We will have rolling reboots of Oakley and Ruby clusters starting from 8:30AM on Monday October 9, 2017.
1:40PM 4/27/2017 Update: Rolling reboots are completed.
3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured late Friday.
Rolling reboot of Owens, Oakley, and Ruby clusters is scheduled to start from Wednesday morning, April 19, 2017. Highlights of the rolling reboot activities:
4:56PM 3/28/2017 Update: The rolling reboots of all systems are completed.
We currently have performance regression of Ruby's GPU nodes. Some of the GPU nodes on Ruby will remain in a power-saving state even after an application starts using them, resulting in performance reduction in some cases. We currently have a reservation on the GPU nodes so that we can do a rolling reboot on them to get them back into a known-good state.
We have opened a bug report with the vendor about this performance regression and how to monitor for it.