|Emergency InfiniBand Shutdown (All systems)||Network||Resolved||
We have returned to service. It appears that we have resolved the networking issues enough to allow jobs to run safely. We will continue working with our vendors to fix any remaining hardware... (Read more)
|7 months 1 week ago||7 months 6 days|
|February 11 2014 Scheduled Downtime||Outage||Resolved||
HPC systems are offline today for scheduled quarterly maintenance activity. For details, please visit osc.edu/n
|1 year 3 weeks ago||1 year 2 weeks|
|Login Shell Issues on Oakley||Account/Shell||Resolved||
UPDATE: The shells have all been switched back for affected users, and you can submit jobs normally again. Additionally, if you are still logged in and have the incorrect shell, logging back out... (Read more)
|8 months 2 weeks ago||8 months 2 weeks|
|Lustre is still offline. HPC systems back up||Maintenance||Resolved||
Day One of the scheduled downtime has been completed, and HPC operations have resumed. As planned, Lustre work will extend into Day Two. Jobs using /fs/lustre or $PFSDIR cannot run until this work... (Read more)
|7 months 4 weeks ago||7 months 3 weeks|
|Lustre jobs suspended||filesystem||Resolved||
The Lustre filesystem ($PFSDIR and /fs/lustre) has crashed several times Friday evening (8/15). We have degraded this service temporarily, while we work to isolate the actions that are triggering... (Read more)
|6 months 3 weeks ago||6 months 1 week|
9/10/14 - We have not seen any additional crashes of the Lustre servers since making this change.
|6 months 1 week ago||5 months 3 weeks|
|Lustre, Infiniband Operational and Being Monitored Closely||filesystem||Resolved||
UPDATE: Most users should no longer see any issues with Lustre.
Again, please continue to notify OSC Help of any errors you see in job output. For example, you might see "... (Read more)
|7 months 5 days ago||6 months 3 weeks|
|Maintenance for OnDemand and other web based services||Resolved||
Update (12/13/14 10am): Maintenance has finished as planned.
OnDemand, AweSim applications, and other web based services will be down starting Wednesday, January 31 at 8:30AM for... (Read more)
|2 months 4 days ago||2 months 3 days|
|MVAPICH broken on Ruby||Ruby||Resolved||
Update Monday February 16th -- Ruby MVAPICH2 build fixed.
Ruby's MVAPICH2 build has been fixed. Please email email@example.com with any issues.... (Read more)
|2 weeks 6 days ago||2 weeks 2 days|
|my.osc.edu logins failing||Account Management||Resolved||
Logins to my.osc.edu are failing. This is unrelated to our InfiniBand issue; a router change at OARnet is the believed cause. They are working on re-establishing the necessary routing.
|7 months 5 days ago||7 months 5 days|