Owens
LOGIN NODES NOT AVAILABLE THURSDAY JAN. 17 NOON-3PM
We will perform the replacement work of Ethernet switches during the noon to 3pm on Thursday, Jan 17, which will impact all login nodes and 2 quick nodes on Owens. As a result, users won't be able to log into Owens or use Owens VDI through OnDemand during the maintenance. Running jobs on Owens, as well as other OSC services (Pitzer, Ruby, and fileystems) won't be impacted.
Owens switch replacements
OSC will replace the Ethernet switches in the Owens cluster starting from Dec 14. We do not expect any user-visible impacts from the work. Owens will have slightly reduced capacity when we temporarily shut down 2 or 3 racks on the day of the replacement. See here for more info: https://bit.ly/2Qkq0ct
Pitzer Production Deployment December 4
Pitzer, OSC's latest cluster, will be deployed to full production status on Tuesday, December 4. All users will have access to the cluster and will be able to submit jobs. For details on how to modify your jobs to run on Pitzer, please see https://bit.ly/2P7G4Zz
For general information about the new cluster, please visit osc.edu and see our Cluster Computing pages. If you have any questions, please contact OSC help https://bit.ly/29AXmdf
Services have been restored after switch failure
At about 1:50 am on November 14th, 4:05 am on November 17th, and 5:00 am on November 18th, OSC experienced three separate major switch failures. We restored all the services after each outage, and have completed the update to the NetApp appliance that provides the home directory service to address a separate bug triggered by the outage. We are still working with the vendor for the network switches on a permanent resolution to the bug that has caused these interruptions. We will continue to keep you informed.
Switch failure on Nov 17 2018
At about 4:05 am on November 17th, OSC experienced a major switch failure which resulted in the home directory service and GPFS file systems being disrupted. Most services were back up around 10 am, but some users may still be seeing stale file handles on GPFS. We are still working on recovering GPFS clients. For more updates, see: https://bit.ly/2DIXr1G
Reboot of NetApp as part of an upgrade on November 19
We will have a reboot of the NetApp as part of an upgrade, starting from 9:30 AM on Monday, November 19, 2018, to address a bug that causes NetApp issues caused by the network switch outage we had on Nov 14, 2018. Any cluster nodes, OnDemand service, and all filesystems won't be impacted by the reboot. We also do not expect any disruptions to users' jobs due to this reboot.
Major network switch outage on November 14, 2018
At about 1:50 AM on November 14th, OSC experienced a major switch failure which resulted in the home directory service being disrupted. As a result, the home directories were offline and all logins were failing to all clusters. All user-facing issues have been resolved and the services are back. Running jobs may recover, but please look at job output to verify correctness. Some jobs experienced failures and will need to be resubmitted. For more information, see: https://bit.ly/2FlCZFD
NBO 6.0 now available on Owens
Date:
Friday, November 2, 2018 - 12:00pm
System(s):
NBO 6.0 has been installed on the Owens cluster; usage is via the module nbo/6.0. For information on available executables and installation details see the software page for NBO.