15 July 2016, 5:00PM update: some additional issues we are facing

Known Issues

Title Cat. Res.sort ascending Description Post Upd.
Oakley login node instability Operations Resolved

Oakley login nodes are seeing some instability related to Lustre. We will reboot the nodes on Thursday, October 2nd 2014 to resolve the issue. If a login node crashes before then and we have the... (Read more)

1 year 10 months ago 1 year 8 months
Statewide Intel compiler license checkout failures Licensing Resolved

This morning (9/10/14) we updated our Intel compiler licenses. We are seeing some unexpected license checkout failures in the logs (please click through to see details):

10:44:... (Read more)          
1 year 10 months ago 1 year 10 months
Lustre Updates filesystem Resolved

9/10/14 - We have not seen any additional crashes of the Lustre servers since making this change.

8/26/14 
- Lustre jobs are being accepted as of 10AM this... (Read more)

1 year 11 months ago 1 year 10 months
Armstrong offline until Noon Armstrong Resolved

Armstrong will need to be taken down today until Noon.  In the meantime, contact OSCHelp (OSCHelp@osc.edu) for account assistance.

1 year 11 months ago 1 year 11 months
Lustre jobs suspended filesystem Resolved

The Lustre filesystem ($PFSDIR and /fs/lustre) has crashed several times Friday evening (8/15). We have degraded this service temporarily, while we work to isolate the actions that are triggering... (Read more)

1 year 11 months ago 1 year 11 months
OnDemand, Awesim, and DB Services down morning of Feb 12 Resolved

Update: Reboot was succesful.  OnDemand, Awesim, and Database services are back online.  Report any issues to oschelp@osc.edu.


A short reboot... (Read more)

1 year 5 months ago 1 year 5 months
MVAPICH broken on Ruby Ruby Resolved

Update Monday February 16th -- Ruby MVAPICH2 build fixed.

Ruby's MVAPICH2 build has been fixed.  Please email oschelp@osc.edu with any issues.

... (Read more)
1 year 5 months ago 1 year 5 months
June 7th downtime to finish at 6:30PM Connectivity, filesystem, Infrastructure, login, Login Problems, Maintenance, Operations, Outage Resolved

Update: Downtime completed at 6:30PM, June 7th.

 

The June 7th downtime is now slated to be completed at 6:30PM.  Previous estimate was 5PM.

All systems and services will... (Read more)

1 month 2 weeks ago 1 month 2 weeks
Submit filter bug after downtime Batch Resolved

A change was made to a part of our batch software during the downtime that should have only affected users who are a part of multiple projects. We have found that there is a bug in the changes... (Read more)

5 months 2 weeks ago 5 months 2 weeks
Scheduling temporarily suspended on Oakley Batch Resolved

We are migrating the batch scheduler on Oakley to a new virtual machine. In order to accomplish this, the scheduler will be temporarily offline for about four hours on December 16th. Running jobs... (Read more)

7 months 1 week ago 7 months 1 week

Pages