We are experiencing some network performance issues on a cluster of servers involved with providing GPFS and some project filesystems. GPFS appears to be functioning acceptably, but proj01, proj02, proj03, proj08, and proj09 are not. Compute nodes attempting to write to these filesystems will see very slow write speeds.
The root cause has been identified as a damaged fiber optic cable. We will be replacing this cable, and expect an outage of less than one minute to the affected hosts.
UPDATE: The cable has been replaced, and performance has returned to normal.
Over the past two weeks we have experienced Oakely login node crashes potentially caused by a Lustre bug.