Over the past two weeks we have experienced Oakely login node crashes potentially caused by a Lustre bug.

Lustre, Infiniband Operational and Being Monitored Closely

Category: 
Resolution: 
Resolved

UPDATE: Most users should no longer see any issues with Lustre.


Again, please continue to notify OSC Help of any errors you see in job output. For example, you might see "IBV_EVENT_PORT_ERR" in your job output. Notifying the helpdesk quickly will help the Operations staff to reduce the effects of any issues.

We apologize for the disruption. We work hard to avoid these incidents, but sometimes they do happen. We appreciate your patience.

Alert users: 
display