So I get a lot of questions from Teachers.
Many at the moment are round efficiency and optimisation of cloud companies. Or just understanding what college students are doing with the assets.
Many are particularly across the measurement and administration of Azure GPS getting used within the instructing of DNN, ML and AIhttps://docs.microsoft.com/en-us/azure/virtual-machines/home windows/sizes-gpuThe most typical is ‘what’s the very best observe for monitoring GPU cores/RAM utilization on N-series DSVM(s)?’
So there are answers like logging into every VM and working “watch nvidia-smi” however this merely just isn’t scalable and sophisticated to handle throughout an property of machines or clusters.
So the request is how can I do that merely and have a pleasant visible of utilization throughout my class or cohort.
So would not it’s nice is to have a single view of the utilisation in some type of dashboard visible.
Properly you now can! Because of some Microsoft colleagues Mathew Salvaris and Miguel Fierro. They’ve created an app for monitoring GPUs on a single machine and throughout a clusters.
You need to use it to document varied GPU measurements throughout a particular interval utilizing the context primarily based loggers or constantly utilizing the gpumon cli command. The context logger can both document to a file, which could be learn again right into a dataframe, or to an InfluxDB database.
Knowledge from the InfluxDB database can then be accessed utilizing the python InfluxDB shopper or could be seen in realtime utilizing dashboards reminiscent of Grafana.
They’ve an incredible instance which is out there in Jupyter pocket book and could be discovered right here
Under is an instance dashboard utilizing the InfluxDB log context and a Grafana dashboard
You may obtain the set up and supply from https://github.com/msalvaris/gpu_monitor