3:35 PM 4/30/2018 Update:
The cause is that NFSv4.1 is not configured correctly after OS on Owens was updated from RHEL 7.3 to 7.4. We re-rebooted the Owens compute nodes using NFSv4.0 with the correct configuration, which fixes the problem. Please contact firstname.lastname@example.org if you have any questions.
9:30 AM 4/18/2018 Original Post:
Users may have been experiencing job failures on Owens cluster since April 16, 2018. Some Owens nodes after being rebooted fail to pick up the new filesystem, and simply hang after users' jobs have been allocated to these nodes. Your job may be impacted by this issue if it reports almost zero CPU usage.
We are actively investigating the issue and will update the community as more is known. We apologize for any inconvenience this may cause you. Please contact email@example.com if you have any questions.