Rolling reboot of Owens cluster, starting from 9AM September 11, 2017
We will have a rolling reboot of Owens starting from 9AM on Monday, September 11 2017.
We will have a rolling reboot of Owens starting from 9AM on Monday, September 11 2017.
All PBS commands on Owens are working now
Rolling reboot of login and compute nodes of Owens cluster is completed.
3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address this GPFS issue.
We have had issues with GPFS mounts on Owens Cluster since Friday afternoon, April 14, 2017. The affected nodes have been marked offline to be restarted or rebooted to fix this issue. Jobs may have been negatively impacted by this issue since April 14. If you experience any 'stale file handle' or file not found errors, please let us know.
1:40PM 4/27/2017 Update: Rolling reboots are completed.
3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured late Friday.
Rolling reboot of Owens, Oakley, and Ruby clusters is scheduled to start from Wednesday morning, April 19, 2017. Highlights of the rolling reboot activities:
3:45PM April 3, 2017 Update: GPU nodes on Owens are available.
206 Owens nodes are not accessible to users due to GPU testing and a bad Ethernet switch. It is expected that 48 nodes with switch problem will be availabe by Friday, March 31 and the rest for GPU testing will be available on Monday, April 3, 2017.
We apologize for the inconvenience this may cause you. Please contact oschelp@osc.edu if you have any questions.
4:56PM 3/28/2017 Update: The rolling reboots of all systems are completed.
Some MVAPICH2 MPI installations on Oakley, Ruby, and Owens, such as the default module mvapich2/2.2 as well as mvapich2/2.1, appear to have a bug that is triggered by certain programs. The symptoms are 1) the program hangs or 2) the program fails with an error related to Allreduce or Bcast.