Infrastructure

Nsight GPU profiler not working due to DCGM conflict

UPDATE (Mar 15, 2023)

After the downtime on Mar. 14, 2023, OSC enabled a new Slurm option --gres=nsight. DCGM will be disabled on the nodes for the job with the Slurm option, and Nsight will function normally.

==================================

We are experiencing an issue with Nsight GPU profiler, which is affected by the GPU monitoring service (DCGM) that we are running.

This causes Nsight to malfunction, and produce error messages: