CAPTCHA
Image CAPTCHA
Enter the characters shown in the image.
This question is for testing whether or not you are human.
  • Create new account
  • Reset your password

User account menu

Home
The Hyperlogos
Read Everything

Main navigation

  • Home
  • My Resumé
  • blog
  • Howtos
  • Pages
  • Contact
  • Search

nvidia instrumentation lies, a lot

Breadcrumb

  • Home
  • User Blogs
  • User Blog
  • nvidia instrumentation lies, a lot
By drink | Sun February 09, 2025

nVidia corp has emerged as the de facto standard for both 3d graphics and GPGPU (today, mostly "AI") for a variety of reasons including performance and ease of development. But they are irritating in multiple ways, and I've just discovered another one that probably everyone knew about but me.

nVidia provides some tools for monitoring GPUs. One of them is nvidia-smi, which does a variety of things. One thing is that it prints a quick overview of the current state, and what programs are accessing the GPU and how much VRAM they are using. It can also display or set specific parameters. I used this functionality to make a script called nvstats.pl which reports on lots of GPU information at once. While looking at the output while not really using the GPU, I found myself looking at the GPU utilization percentage going up to as high as 50%. This made me wonder if something was running that I didn't know about.

In response to this question I ran nvtop, which provides a display sort of like the top command, including a graph of current GPU and VRAM use. It was even weirder, because it was showing the same kind of utilization on the graph that I was seeing, but also showing that no process on the GPU was using more than a few percent of its capabilities. But by watching the output of both commands I finally figured out what was happening: Every indicator except for the per-command GPU utilization percentage statistic is based on the current GPU clock rate.

This is, to say the least, really stupid. It's not immensely stupid to have these numbers be based on the current clock rate, because you can pin it. However, it is horribly stupid for nvtop to do it for every number except per-process utilization, because it produces inconsistent output. I would also argue that doing it at all is misleading; in order to get accurate numbers as a percentage of the actual capabilities of the GPU when you have not pinned the GPU speed to the maximum, you have to adjust the percentages by a factor equal to the current GPU clock rate divided by the maximum default GPU clock rate.

So, let's say you're running nvstats.pl and you see the following (with the irrelevant lines excluded):

utilization.gpu [%]: 35 %
clocks.current.graphics [MHz]: 675 MHz
clocks.max.graphics [MHz]: 3105 MHz

The actual GPU utilization is only 675/3105*35 or 7.6%. Nothing unexpected is using my GPU, nVidia is simply terrible at reporting.

linux
nVidia
  • Log in or register to post comments

Footer menu

  • Contact
Powered by Drupal

Copyright © 2025 Martin Espinoza - All rights reserved