Rolling reboot of Owens cluster, starting from 8:30AM Oct 30, 2017

All HPC systems are available

Resolution: 
Resolved

8/24/16 3:57PM: All HPC systems are availalbe including:

  • Oakley cluster for general access
  • Ruby cluster for restricted access
  • Owens cluster for early users
  • Home directory and scratch file systems
  • OnDemand and other web portals
  • Project file system (/fs/project)

All jobs held before downtime have been released by the batch scheduler. If your jobs are still held or you have any questions, please contact oschelp@osc.edu

8/22/16 5:00PM: Partial of our services are now available including:

  • Oakley cluster for general access
  • Ruby cluster for restricted access
  • Owens cluster for early users
  • Home directory and scratch file systems
  • OnDemand and other web portals

Affected service is not available including:

  • Project file system (/fs/project)

Instructions for clients on how to work on our partial systems:

All jobs held before downtime have been released by the batch scheduler, except for those of which the job owners have project allocations. If your jobs are still held, please follow the instructions below:

  • Double check whether your job requires project file system (/fs/project or /nfs/gpfs). For a list of your jobs, use the command: qstat -u userID
  • You can request to release all jobs that do not require project file system. Send a list of jobs you'd like us to release to oschelp@osc.edu

You can also submit new jobs. Please make sure that the new submitted jobs do not require project file system (/fs/project or /nfs/gpfs)

8/22/16 12:36PM: All HPC systems are not yet available. We are looking to restore partial services. We will have an update at 4 p.m. today.

8/20/16 9:00AM: All HPC systems are not yet available. We've made significant progress in bringing the system back.

8/19/16 9:30AM: All HPC systems are not yet available.

Reason: We’ve been having some issues with the filesystem check on Project service

Current Status: We don’t have an expected return time yet, but are actively working on getting systems back up as soon as we can. We will update later when we know more