UPDATE (Mar 15, 2023)
After the downtime on Mar. 14, 2023, OSC enabled a new Slurm option
--gres=nsight. DCGM will be disabled on the nodes for the job with the Slurm option, and Nsight will function normally.
We are experiencing an issue with Nsight GPU profiler, which is affected by the GPU monitoring service (DCGM) that we are running.
This causes Nsight to malfunction, and produce error messages:
==ERROR== Profiling failed because a driver resource was unavailable. Ensure that no other tool (like DCGM) is concurrently collecting profiling data. See https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#faq for more details.
We are looking for a workaround to resolve this issue.
Please contact firstname.lastname@example.org if there are questions.