Performance Regression of GPU Nodes on Ruby


We currently have performance regression of Ruby's GPU nodes. Some of the GPU nodes on Ruby will remain in a power-saving state even after an application starts using them, resulting in performance reduction in some cases. We currently have a reservation on the GPU nodes so that we can do a rolling reboot on them to get them back into a known-good state.

We have opened a bug report with the vendor about this performance regression and how to monitor for it.