The Ohio Supercomputer Center (OSC) is experiencing an email delivery problem with several types of messages from MyOSC. 

OSC is preparing to update Slurm on its production systems to version 23.11.4. 

Switch failure on Nov 17 2018

Category: 
Resolution: 
Resolved

Updates at 12:51 PM Nov 19:

At about 5:00 am on November 18th, OSC experienced another major switch failure. As of noon Nov 18, all services have been fully restored. In addition, we have completed the update to the NetApp appliance that provides the home directory service that was planned for Nov 19, in the hopes that should another switch failure occur the impact will be lessened. 

We are still working with the vendor for the network switches on a permanent resolution to the bug that has caused these interruptions. We will continue to keep you informed. 

Updates at 2:10 PM Nov 17:

All the services are back. We have implemented a temporary workaround to avoid the same switch failure from occurring again. We will keep monitoring the systems to make sure things remain healthy. We have opened a ticket with the vendor to have a proper fix. 

If you experience any unexpected behavior, contact OSC Help

Original Post:

At about 4:05 am on November 17th, OSC experienced a major switch failure which resulted in the home directory service and GPFS file systems being disrupted. Most services were back up around 10 am, but some users may still be seeing stale file handles on GPFS. We are still working on recovering GPFS clients. If you experience any unexpected behavior, contact OSC Help