Update on 02/24/2017: All services available

Category: 
Resolution: 
Resolved

02/24/17 3:50PM Update: All Services have been restored including:

  • Oakley cluster with full capacity for general access
  • Ruby cluster with full capacity for restricted access
  • Owens cluster with full capacity for general access
  • Home directory, project and scratch filesystems
  • OnDemand and other web portals

All jobs held before downtime have been released by the batch scheduler. If your jobs are still held or you have any questions, please contact oschelp@osc.edu

 

2/24/17 11:55PM Update:

  • Owens nodes are now fully available.
  • We’ve been working with the storage vendors to address two bugs we encountered, and we have a work plan that we hope will allow us to put the Project service back in production later today.

As of now, available services including:

  • Oakley cluster with full capacity for general access
  • Ruby cluster with full capacity for restricted access
  • Owens cluster with full capacity for general access
  • Home directory and scratch filesystems
  • OnDemand and other web portals

Affected service that is NOT available including:

  • Project filesystem (/fs/project)

Please contact oschelp@osc.edu if you have any questions. 

 

2/23/17 12:00PM Update: Partial services will be available at approximately 1PM today including:

  • Oakley cluster including all login and compute nodes for general access
  • Ruby cluster including all login and compute nodes for restricted access
  • Owens cluster including login and half of compute nodes for general access
  • Home directory and scratch filesystems
  • OnDemand and other web portals

Effected services will NOT be available including:

  • Project filesystem (/fs/project)
  • Half of compute nodes on Owens. Users will experience slowness of scheduling jobs on Owens

 

Instructions for clients on how to work on our partial systems:

All jobs held before downtime will be released by the batch scheduler once partial services are available, except for those of which the job owners have project allocations. If your jobs are still held, please follow the instructions below:

  • Double check whether your job requires project file system (/fs/project or /nfs/gpfs). For a list of your jobs, use the command: qstat -u userID
  • You can request to release all jobs that do not require project file system. Send a list of jobs you'd like us to release to oschelp@osc.edu

You can also submit new jobs. Please make sure that the new submitted jobs do not require project file system (/fs/project or /nfs/gpfs).

We sincerely apologize for the inconvenience this may cause you. Please contact oschelp@osc.edu if you have any questions. 

 

2/22/17 Update: We have to extend the downtime (which started from 7am Feb 21st) to complete filesystem checks. Login services, including all OSC systems, my.osc.edu, and access to storage are not available. We don't know when all services will be back, and we'll keep you posted once we know more. 

For more information on this downtime, see: https://www.osc.edu/calendar/events/2017_02_21-two_day_system_downtime

We really apologize for the inconvenience this may cause you. Please contact oschelp@osc.edu if you have any questions.