Update: 2PM March 12th: Armstrong is back up and running. Please notify firstname.lastname@example.org of any lingering issues.
As of 10AM Thursday March 12th... (Read more)
|2 months 2 weeks ago||2 months 2 weeks|
|Statewide Intel compiler license checkout failures||Licensing||Resolved||
This morning (9/10/14) we updated our Intel compiler licenses. We are seeing some unexpected license checkout failures in the logs (please click through to see details):
10:44:... (Read more)
|8 months 2 weeks ago||8 months 6 days|
9/10/14 - We have not seen any additional crashes of the Lustre servers since making this change.
|9 months 6 days ago||8 months 2 weeks|
|Armstrong offline until Noon||Armstrong||Resolved||
Armstrong will need to be taken down today until Noon. In the meantime, contact OSCHelp (OSCHelp@osc.edu) for account assistance.
|9 months 1 week ago||9 months 1 week|
|Lustre jobs suspended||filesystem||Resolved||
The Lustre filesystem ($PFSDIR and /fs/lustre) has crashed several times Friday evening (8/15). We have degraded this service temporarily, while we work to isolate the actions that are triggering... (Read more)
|9 months 2 weeks ago||9 months 6 days|
|issue with OnDemand 6:09 - 8:39 pm||Resolved||
OnDemand, epi accounting queries, the Viper DB, the Medline DB, the Eweld DB,... (Read more)
|9 months 2 weeks ago||9 months 2 weeks|
|my.osc.edu logins failing||Account Management||Resolved||
Logins to my.osc.edu are failing. This is unrelated to our InfiniBand issue; a router change at OARnet is the believed cause. They are working on re-establishing the necessary routing.
|10 months 12 hours ago||10 months 11 hours|
|Lustre, Infiniband Operational and Being Monitored Closely||filesystem||Resolved||
UPDATE: Most users should no longer see any issues with Lustre.
Again, please continue to notify OSC Help of any errors you see in job output. For example, you might see "... (Read more)
|10 months 12 hours ago||9 months 2 weeks|
|Emergency InfiniBand Shutdown (All systems)||Network||Resolved||
We have returned to service. It appears that we have resolved the networking issues enough to allow jobs to run safely. We will continue working with our vendors to fix any remaining hardware... (Read more)
|10 months 2 days ago||10 months 1 day|
|Certain modules not accessible||Software||Resolved||
Certain modules are not working for all clusters since the downtime. We have reports specifically that Amber, Gaussian, and Turbomole are not working. We are working to resolve the issue, but... (Read more)
|10 months 3 weeks ago||10 months 3 weeks|