HPC System Downtime

Tuesday, July 8, 2014 - 10:00am to 10:00pm

The Ohio Supercomputer Center has scheduled a two-phase downtime for all HPC systems starting Tuesday, July 8, 2014, beginning 6 a.m. The first phase of the downtime will end at 6 p.m. that day, when the bulk of services will return to production. The second phase tentatively will last until 5 p.m., July 9, 2014. The downtime will affect the Glenn Cluster, Oakley Cluster, web portals, and HPC file servers. Login services and access to storage will not be available during Day One.

In order to quiesce the system for an orderly shutdown the batch scheduler will begin holding jobs that cannot complete before 6 a.m. on July 8. Jobs that are not started will be held until after the downtime and then started once the system is returned to production status.

Departmental clusters that we are administering will not be affected by this outage.

Highlights of the downtime activities:

  • Upgrade to new InfiniBand network spine switches to provide room for growth for future systems.
  • Modifications to the OSC LDAP servers to improve account management.
  • Various software updates and routine maintenance.
  • Upgrade to the Lustre server and client software. 
The Lustre improvements are necessitating the second day of downtime, and Lustre is the only service projected to remain offline on Day Two. Jobs utilizing Lustre (that use $PFSDIR or access /fs/lustre) will be prevented from running throughout the entire downtime, while jobs not utilizing Lustre will be able to run upon the completion of Day One. We will have a better estimate of when Lustre will return to service closer to the downtime, and we will update the MOTD andosc.edu/n with more details as we have them. We will provide details about our progress regarding Day Two of the outage with messages on our website's front page.

To stay up to date on system notices, please visit http://osc.edu/n or follow@HPCNotices on Twitter.