Understanding your GPU Efficiency on Azure with GPU Monitor – Microsoft School Connection


So I get a lot of questions from Teachers.

Many at the moment are round efficiency and optimisation of cloud companies. Or just understanding what college students are doing with the assets.

Many are particularly across the measurement and administration of Azure GPS getting used within the instructing of DNN, ML and AIhttps://docs.microsoft.com/en-us/azure/virtual-machines/home windows/sizes-gpuThe most typical is ‘what’s the very best observe for monitoring GPU cores/RAM utilization on N-series DSVM(s)?’

So there are answers like logging into every VM and working “watch nvidia-smi” however this merely just isn’t scalable and sophisticated to handle throughout an property of machines or clusters.

So the request is how can I do that merely and have a pleasant visible of utilization throughout my class or cohort.

So would not it’s nice is to have a single view of the utilisation in some type of dashboard visible.

Properly you now can! Because of some Microsoft colleagues Mathew Salvaris and Miguel Fierro. They’ve created an app for monitoring GPUs on a single machine and throughout a clusters.

You need to use it to document varied GPU measurements throughout a particular interval utilizing the context primarily based loggers or constantly utilizing the gpumon cli command. The context logger can both document to a file, which could be learn again right into a dataframe, or to an InfluxDB database.

Knowledge from the InfluxDB database can then be accessed utilizing the python InfluxDB shopper or could be seen in realtime utilizing dashboards reminiscent of Grafana.

They’ve an incredible instance which is out there in Jupyter pocket book and could be discovered right here

Under is an instance dashboard utilizing the InfluxDB log context and a Grafana dashboard


You may obtain the set up and supply from https://github.com/msalvaris/gpu_monitor






Supply hyperlink

Add a Comment

Your email address will not be published. Required fields are marked *