We will have a rolling reboot of login and compute nodes of Owens cluster starting from Monday, April 16, 2018.

Users may have been experiencing job failures on Owens cluster since April 16, 2018

Lustre, Infiniband Operational and Being Monitored Closely

Category: 
Resolution: 
Resolved

UPDATE: Most users should no longer see any issues with Lustre.


Again, please continue to notify OSC Help of any errors you see in job output. For example, you might see "IBV_EVENT_PORT_ERR" in your job output. Notifying the helpdesk quickly will help the Operations staff to reduce the effects of any issues.

We apologize for the disruption. We work hard to avoid these incidents, but sometimes they do happen. We appreciate your patience.