Monitoring¶
Contents
Note
This article talks about monitoring the local machine only. For more information about how to monitor multiple machines, see here.
Overview¶
Genv supports system and resource monitoring using Prometheus and Grafana.
This is done with the Genv monitoring service that collects metrics about the system and resources and exports it in Prometheus format.
The monitoring service provides default configuration files for Prometheus and Grafana as well as a default Grafana dashboard. This means that everything works as plug-and-play right out of the box.
Quick start¶
This is a guide to get started with monitoring features in Genv.
Prerequisites¶
First, you will need to install the prometheus-client
PyPI package:
pip install prometheus-client
Note
This is installed automatically when installing Genv with pip install genv[monitor]
Running the monitoring service¶
Now, start the monitoring service using the following command:
genv monitor
Note
genv monitor
acts as a foreground daemon and runs until a Ctrl+C
is received.
Therefore, you will need to keep the terminal running while monitoring the system.
Prometheus¶
First, download the Prometheus precompiled binaries.
Then, open another terminal and unzip the archive file using the command:
tar xvfz prometheus-*.tar.gz
cd prometheus-*/
The Genv monitoring service publishes a configuration file for Prometheus.
By default, it is published at /var/tmp/genv/metrics/prometheus/prometheus.yml
.
You can see its contents using cat
:
$ cat /var/tmp/genv/metrics/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: genv
static_configs:
- targets: ['localhost:8000']
This essentially tells Prometheus to scrape the Genv exporter which is available at port 8000 thanks to the monitoring service we ran using the command genv monitor
.
Now, let’s run Prometheus and specify the configuration file path:
./prometheus --config.file=/var/tmp/genv/metrics/prometheus/prometheus.yml
Now, you can open your browser at http://localhost:9090 and access Genv metrics.
Grafana¶
First, open another terminal and download and extract the Grafana precompiled binaries. Then, enter the directory:
cd grafana-*/
The Genv monitoring service publishes a configuration file for Grafana.
By default, it is published at /var/tmp/genv/metrics/grafana/grafana.ini
.
You can see its contents using cat
:
$ cat /var/tmp/genv/metrics/grafana/grafana.ini
[auth.anonymous]
enabled = true
org_name = Main Org.
org_role = Viewer
[paths]
provisioning = /var/tmp/genv/metrics/grafana/provisioning
[dashboards]
default_home_dashboard_path=/var/tmp/genv/metrics/grafana/dashboards/overview.json
This essentially tells Grafana where its datasources and dashboards are, as well as configures the default dashboard.
As mentioned before, the Genv monitoring service also provides a Prometheus data source as well as dashboards.
You can see the contents of /var/tmp/genv/metrics/grafana
using find
:
find /var/tmp/genv/metrics/grafana
/var/tmp/genv/metrics/grafana
/var/tmp/genv/metrics/grafana/dashboards
/var/tmp/genv/metrics/grafana/dashboards/overview.json
/var/tmp/genv/metrics/grafana/provisioning
/var/tmp/genv/metrics/grafana/provisioning/datasources
/var/tmp/genv/metrics/grafana/provisioning/datasources/default.yml
/var/tmp/genv/metrics/grafana/provisioning/dashboards
/var/tmp/genv/metrics/grafana/provisioning/dashboards/default.yml
/var/tmp/genv/metrics/grafana/grafana.ini
Now, let’s run Grafana and specify the configuration file path:
./bin/grafana-server --config /var/tmp/genv/metrics/grafana/grafana.ini web
Now, you can open your browser at http://localhost:3000 and see the Genv dashboard. You should now see a dashboard similar to the following:
Permissions¶
The monitoring needs to query the environment variables of processes in order to tell their Genv environment identifier.
Linux users usually can’t query the environment variables of other users.
Therefore, you will probably need to execute the genv monitor
commands using sudo
with a command similar to the following:
sudo genv monitor ...
Running as a daemon¶
genv monitor
acts as a foreground daemon and runs until a Ctrl+C
is received.
Therefore, you will need to keep the terminal running while monitoring the system.
When monitoring a GPU machine or a cluster of GPU machines, one might want to run the monitoring for long periods of time, like days and even weeks. To do so, the Genv monitoring daemon should not be attached to a specific terminal session, so that it would continue running when the session exits.
We recommend to use tmux for this.
Here is an example of how to use tmux
for running genv monitor
in the background.
Create a new tmux session and name it genv-monitor
with the command:
tmux new -s genv-monitor
Run genv monitor
inside:
genv monitor
Detach from the session with Ctrl-b
+ d
.
Then, you can reattach after some time with the command:
tmux attach -t genv-monitor
Reference¶
Metric |
Labels |
Description |
---|---|---|
|
Genv installation status |
|
|
|
Device temperature in degrees C |
|
|
Device utilization |
|
|
Device used memory in bytes |
|
|
Device total memory in bytes |
|
Number of active environments |
|
|
Number of running processes |
|
|
Number of attached devices |
|
|
Number of active users |
|
|
|
Number of running processes in an environment |
|
|
Number of attached devices of an environment |
|
|
Number of devices used by a process |
|
|
Used GPU memory by a process |
|
|
Number of active environments of a user |
|
|
Number of running processes of a user |
|
|
Number of attached devices of a user |