Over the past two weeks we have experienced Oakely login node crashes potentially caused by a Lustre bug.

Brief disruption on 8/1/2013 at 8AM

Category: 
Resolution: 
Resolved

At 8AM on the morning of 8/1/2013, we will be replacing some faulty hardware in our network infrastructure. Unfortunately, this work cannot be delayed until the next downtime, and the replacement will cause a short disruption of network services for our compute nodes. Jobs may temporarily hang, if they are attempting to communicate with network provided storage or communicate between nodes. It is possible that a few jobs may actually fail to complete properly, but only under a very specific set of circumstances. We expect that most jobs will simply pause until the network becomes available again.

We will also be making some configuration changes to the Intel compilers, which will briefly require turning off the license server. Additionally, the MATLAB license server will see a brief interruption.

Finally, we will be fixing a bug which prevents pbsdcp from working correctly with all versions of MVAPICH2 on Oakley.

Please contact OSC Help if you have any questions or concerns.