Rolling reboot of Owens cluster, starting from 8:30AM Oct 30, 2017
We will have rolling reboots of Oakley and Ruby clusters starting from 8:30AM on Monday October 9, 2017.
We will have a rolling reboot of Owens starting from 9AM on Monday, September 11 2017.
All PBS commands on Owens are working now
There is a bug with VASP 5.4.1 built with mvapich2/2.2 on Owens such that the VASP job with out-of-memory issue crashes the Owens compute node(s). We will investigate monitoring for this type of jobs so that we can cleanup after the job more efficiently, and notify the user of their problem more quickly.
3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address this GPFS issue.
We have had issues with GPFS mounts on Owens Cluster since Friday afternoon, April 14, 2017. The affected nodes have been marked offline to be restarted or rebooted to fix this issue. Jobs may have been negatively impacted by this issue since April 14. If you experience any 'stale file handle' or file not found errors, please let us know.
1:40PM 4/27/2017 Update: Rolling reboots are completed.
3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured late Friday.
Rolling reboot of Owens, Oakley, and Ruby clusters is scheduled to start from Wednesday morning, April 19, 2017. Highlights of the rolling reboot activities:
1:00PM 4/6/2017 Update: The Scratch and Project file systems are back to normal service. Scheduling on systems are resumed. We are still investigating the causes to this problem and will keep you updated when we know more.
The Scratch and Project file systems are currently hung. Schedulings on all three clusters (Owens, Ruby, and Oakley) have been paused for investigating this problem. We will update this page when we know more.
Starting from Thursday, Feb 2nd, the
$PFSDIR directory on scratch (/fs/scratch) won’t be created by job prologue. For example, if you simply use the command
cd $PFSDIR, you will get an error indicating that this directory does not exist. The reason we are making this change is to address recent problems with the batch environments on OSC’s clusters. You will have to create the
$PFSDIR directory by yourself if you use this directory. Please include the following additional lines in the job script.
If you use bash:
We have noticed some abaqus jobs end up in BatchHold. Once the job is in BatchHold, it will never start. This is because of sharing the abaqus licenses between Oakley and Owens. We have opened a support request with the job scheduler vendor.