Maintenance

Rolling reboot of all clusters, starting from Wednesday morning, April 19, 2017

1:40PM 4/27/2017 Update: Rolling reboots are completed. 

3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured late Friday. 

Rolling reboot of Owens, Oakley, and Ruby clusters is scheduled to start from Wednesday morning, April 19, 2017. Highlights of the rolling reboot activities:

Lustre is still offline. HPC systems back up

Day One of the scheduled downtime has been completed, and HPC operations have resumed. As planned, Lustre work will extend into Day Two. Jobs using /fs/lustre or $PFSDIR cannot run until this work is completed, but all other jobs can run.

UPDATE: Performance problems with Lustre have prevented us from bringing up the filesystem. We are working on a resolution.

UPDATE: Lustre returned to service the afternoon of July 12th, 2014.

Subscribe to Maintenance