We will have a rolling reboot of login and compute nodes of Owens cluster starting from Monday, April 16, 2018.

Users may have been experiencing job failures on Owens cluster since April 16, 2018

Nvidia drivers on Oakley

Category: 
Resolution: 
Unresolved

We upgraded the drivers for the Nvidia GPUs on all of our clusters during the downtime this week. Unfortunately, we are noticing some subtle problems with the GPUs on Oakley. We will be rolling back to an older driver on that cluster; the GPUs will be unavailable until that work is completed, potentially all weekend.