Rolling reboots of all three clusters, starting from Tuesday, September 4, 2018
Rolling reboots of all clusters, starting from 8 AM Tuesday, June 19, 2018
User may have been getting the following error message when trying to submit a PBS job using job arrays
We will have rolling reboots of Oakley, Ruby and Owens clusters starting from Monday Feb 5, 2018.
qstat: cannot connect to server oak-batch-test.osc.edu on Oakley between around 3~3:30pm Nov 21, 2017.
We will have rolling reboots of Oakley and Ruby clusters starting from 8:30AM on Monday October 9, 2017.
12:35PM 5/24/2017 Update:
pbsdcp has been fixed on Oakley.
pbsdcp is not working on Oakley and returns a missing library error as below:
1:40PM 4/27/2017 Update: Rolling reboots are completed.
3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured late Friday.
Rolling reboot of Owens, Oakley, and Ruby clusters is scheduled to start from Wednesday morning, April 19, 2017. Highlights of the rolling reboot activities:
4:56PM 3/28/2017 Update: The rolling reboots of all systems are completed.
Some MVAPICH2 MPI installations on Oakley, Ruby, and Owens, such as the default module mvapich2/2.2 as well as mvapich2/2.1, appear to have a bug that is triggered by certain programs. The symptoms are 1) the program hangs or 2) the program fails with an error related to Allreduce or Bcast.