Monitor cluster metrics #
Starburst Galaxy exposes metrics data in
OpenMetrics format that can be read with monitoring clients such as
Prometheus or
Datadog.
This page describes how to configure your Galaxy account to support
metrics scraping by your monitor client that supports OpenMetrics.
Limitations #
- Galaxy makes metrics available to pull from your monitor client.
These are not push metrics.
- You configure your client to monitor a single cluster per configuration job.
You can have multiple configuration jobs per monitor client.
- The cluster must be enabled and
running to
support metrics scraping.
Summary of configuration steps #
In brief, the steps are:
- Create a unique role not used elsewhere.
- Grant that role the Monitor cluster privilege on the clusters of
interest.
- Create a Galaxy service account with that role as its default, and
create a password for the account.
- Create a configuration file for your metrics scraping app such as Prometheus.
- Connect with your metrics scraping app.
Create a dedicated role #
- In the Starburst Galaxy navigation menu, click Access > Roles and
privileges.
- Click Add role.
- In the Add a new role pane, specify a role name such as
metrics_scraper, and a description such as “Role used only for Prometheus
(or Datadog) scheduled metrics scraping.”
- For maximum security, do not select the Grant to the creating role
checkbox.
- Click Add role.
Grant privilege per cluster #
- The last step returned you to the Roles pane.
- Locate the role you just created and click its name.
- Click the Privileges tab.
- Click Add privilege.
- Click the Cluster tab.
- In the Cluster name drop-down list, select the name of a cluster you want
to monitor.
- Leave the Allow button selected and click Monitor cluster. Notice
that the Use cluster privilege is inherited by default from the
public role, so you do not need to add that privilege.
- Repeat the previous two steps for other clusters in your account that you
want to monitor.
- When done specifying clusters, click Save privileges.
Create a service account #
- In Galaxy’s navigation menu, click Access > Service
accounts.
- Click Create new service account.
- Provide a name for the service
account.
To avoid confusion, this should not be a human name. Consider reusing the
name of the role created above.
- Select the name of the created role from the Default role drop-down menu.
- Select the Generate password checkbox.
- Click Create.
- In the Generate a password pane, enter a description for this password
such as “Password 1 for the metrics scraper”. You can remove and replace this
password later, as needed.
- Click Generate password. You MUST copy and preserve the generated
password from the next pane. Service account passwords cannot be retrieved
once generated.
Consult the documentation for your OpenMetrics-compatible metrics analysis
client for the format of that client’s configuration files.
The following is a sample configuration template for Prometheus to use as a
starting point.
Note:
For the '<cluster-url>' field, look up the URL
for an individual cluster by selecting Partner connect from the Galaxy
navigation menu, then select the Trino Python tile. Select the cluster of
interest from drop-down menu, then copy that cluster’s URL from the Host
field at the bottom.
global:
scrape_interval: 15s # By default, scrape the target
# every 15 seconds.
# A scrape configuration containing exactly one
# endpoint to scrape. Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>`
# to any timeseries scraped from this config.
- job_name: '<galaxy-cluster-name>'
metrics_path: /v1/metrics
scheme: https
basic_auth:
username: '<service-account-username>'
password: '<service-account-password>'
static_configs:
- targets: ['<cluster-url>']
labels:
cluster: <cluster-name>
For configuration details, consult:
Is the information on this page helpful?