Insights usage metrics#

The usage metrics tab is accessed in Starburst Insights by using the Usage metrics item in the left-hand navigation.

It provides an overview of the cluster usage over a period of time and a cost estimation tool. You can also download the usage data or send it to Starburst from this tab.

Setup#

To access usage metrics through this tab, you must have event logger configured and Insights enabled to use persistent data by setting:

insights.persistence-enabled=true

You must also grant users and/or groups the right to access the Usage metrics feature, as described in Insights configuration. See examples in Authorization examples.

Date filter#

Usage metrics

The time range for which the usage metrics are shown can be selected using a filter that provides the following options:

  • Today

  • Current month

  • Last month

  • Current year

  • Last year

  • All available dates

  • Custom range

Selecting the custom range option allows specifying a start and end date:

Usage metrics date range

A summary of the usage for the selected time range is shown in the pane along with a chart depicting cumulative usage over time. The data for the selected time interval can be downloaded or sent to Starburst.

Cost estimation tool#

The cost estimation tool allows you to optionally input the cost factor (price per vCPU per hour) for the instance type used in your SEP cluster. The estimated cost of usage per filtered period of time is displayed.

Usage metrics cost estimation tool

Automatic usage metrics upload#

Usage metrics are automatically uploaded to Starburst every hour. This upload consists of usage metrics collected since the previous upload. Note that if insights-persistence-enabled is set to false, usage metrics are still collected and stored in memory for up to 24 hours to support this upload.

You can configure this automatic metric upload in config.properties, but the default values should suffice for most deployments. The following snippet shows these configuration properties and their default values:

insights.data-upload-automatic.enabled=true
insights.data-upload-automatic.memory-store-interval=24h
insights.data-upload-automatic.interval=1h
insights.data-upload-protocol=https
insights.data-upload-endpoint=https://telemetry.eng.starburstdata.net/usage-metrics

Manual upload#

If the network upstream of your SEP cluster blocks the automatic usage metric upload, disable the automatic upload by setting insights.data-upload-automatic.enabled to false. You can then manually upload usage metrics data by following these steps:

  1. Download the usage metrics file from the usage-metrics view in Insights.

  2. On a device with internet access, upload the metrics file to Starburst.

Telemetry#

Starburst collects aggregated metrics about product performance and usage to inform our development efforts. All metrics are aggregated since for the time period starting at start_time_unix_nano and ending at time_unix_nano. These timestamps are repeated with the same value with most metrics.

Metrics, with the exception of environmental data, are based on completed queries, whether successful or not. All data is packaged in a JSON format. Examples of this data are provided below.

Telemetry is enabled by default. It can be disabled in Config properties. The following snippet shows the configuration properties and their default values:

telemetry.enabled=true
telemetry.metrics-export-enabled=true
telemetry.metrics-export-interval=1h
telemetry.metrics-endpoint=https://telemetry.eng.starburstdata.net/v1/metrics

Environment#

The environmental data describes the ownership, licensing and service information of an SEP deployment.

SEP environment information#

Key

Value

deployment.environment

Defined by node.environment in config.properties.

license.hash

The hash of the license file, if present.

license.owner

Defined by owner in the SEP license file, or from the account owner of the instance if deployed through a marketplace.

license.type

JSON for Kubernetes or manual deployments, or a string indicating the marketplace it was deployed through, such as AWS.

service.instance.id

A random UUID generated each time SEP starts.

service.name

Always set to starburst-enterprise.

service.version

The SEP version number of the deployment.

telemetry.sdk.language

Always set to java.

telemetry.sdk.name

Always set to opentelemetry.

telemetry.sdk.version

The current version of the OpenTelemetry library used, 1.7.0.

service.start_time

The ISO8601 date and time when the coordinator last started.

The following is an example of the data collected that describes the SEP environment.

"resource":{
    "attributes":[
      {
          "key":"deployment.environment",
          "value":{
            "string_value":"prod"
          }
      },
      {
          "key":"license.hash",
          "value":{
            "string_value":"5000eRAND0M967d0004a4eLICENSEa97b00006023dedeSTRING82460c8500055"
          }
      },
      {
          "key":"license.owner",
          "value":{
            "string_value":"Example Company"
          }
      },
      {
          "key":"license.type",
          "value":{
            "string_value":"JSON"
          }
      },
      {
          "key":"service.instance.id",
          "value":{
            "string_value":"6d35zzzz-2000-4628-zzzz-120000zzzzed"
          }
      },
      {
          "key":"service.name",
          "value":{
            "string_value":"starburst-enterprise"
          }
      },
      {
          "key":"service.version",
          "value":{
            "string_value":"prod"
          }
      },
      {
          "key":"telemetry.sdk.language",
          "value":{
            "string_value":"java"
          }
      },
      {
          "key":"telemetry.sdk.name",
          "value":{
            "string_value":"opentelemetry"
          }
      },
      {
          "key":"telemetry.sdk.version",
          "value":{
            "string_value":"1.6.0"
          }
      }
    ]
},

queries_executed#

SEP collects aggregated counts of certain specific query dimensions.

Query execution count dimensions#

Dimension

Description

columnType

Total queries per column type, across all sources.

connector

Total queries per connector.

connector, queryType

Total queries by connector and query type.

function

Total queries by named function.

sessionProperty, value

Total queries using named session property and a representation of the value. Boolean values are recorded as-is. Binary values are rounded to the nearest base 2 magnitude; for instance, 72 GB is recorded as 64 GB. Other numeric values are rounded down to the nearest order of magnitude; for instance, 54,321 is rounded to 100,000. Text values are not recorded, only the fact that they were set.

source

Total queries per named client, as indicated by client.

The following is an example of the collected dimensional query execution data:

"name":"queries_executed",
"unit":"1",
"sum":{
    "data_points":[
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"3",
          "attributes":[
            {
                "key":"source",
                "value":{
                  "string_value":"trino-cli"
                }
            }
          ]
      },
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"1",
          "attributes":[
            {
                "key":"function",
                "value":{
                  "string_value":"max"
                }
            }
          ]
      },
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"2",
          "attributes":[
            {
                "key":"connector",
                "value":{
                  "string_value":"postgresql"
                }
            },
            {
                "key":"queryType",
                "value":{
                  "string_value":"SELECT"
                }
            }
          ]
      },

queries_failed#

SEP collects aggregated counts of query failures.

Query failure count dimensions#

Dimension

Description

errorCode, failureType

Total failed queries by failure type.

The following is an example of the collected data:

"name":"queries_failed",
"unit":"1",
"sum":{
    "data_points":[
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"3",
          "attributes":[
            {
                "key":"error_code",
                "value":{
                  "int_value":"400"
                }
            },
            {
                "key":"failure_type",
                "value":{
                  "string_value":"Can't create database 'foo'; database exists"
                }
            }
          ]
      },
}

physical_input_bytes#

SEP collects the aggregated byte count of data in all processed queries.

Physical input bytes dimension counts#

Dimension

Description

connector

Total input bytes by connector.

The following is an example of the collected data:

"name":"physical_input_bytes",
"unit":"byte",
"sum":{
    "data_points":[
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"300",
          "attributes":[
            {
                "key":"connector",
                "value":{
                  "string_value":"postgresql"
                }
            }
          ]
      },
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"300"
          ]
      }
    ]
}

physical_input_rows#

SEP collects the aggregated count of input rows of data in all processed queries.

Physical input rows dimension counts#

Dimension

Description

connector

Total input rows by connector

The following is an example of the collected data:

"name":"physical_input_rows",
"unit":"1",
"sum":{
    "data_points":[
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"300",
          "attributes":[
            {
                "key":"connector",
                "value":{
                  "string_value":"postgresql"
                }
            }
          ]
      },
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"300"
          ]
      }
    ]
}

Query performance and complexity metrics#

SEP collects aggregations of key performance and complexity measures of the queries it processes.

Query performance and complexity metrics#

Metric

Data type

Description

analysis_time

Histogram

Binned query analysis times for all queries in the collection time period.

catalogs

Histogram

Binned number of distinct catalogs used in a query for all queries in the collection time period.

connectors

Histogram

Binned number of distinct connectors used in a query for all queries in the collection time period.

cpu_time

Histogram

Binned total CPU time spent processing a query, for all queries in the collection time period.

cumulative_memory

Single value

Binned cumulative memory for a single query throughout its processing, for all queries in the collection time period. This is different from peak memory; not all of the cumulative memory may have been in use at the same time.

cumulative_system_memory

Single value

Cumulative memory used by queries in the collection period.

execution_time

Histogram

Binned query execution times for all queries in the collection time period.

input_columns

Histogram

Binned number of input columns used in a query for all queries in the collection time period.

output_columns

Histogram

Binned number of output columns resulting from a query for all queries in the collection time period.

peak_task_total_memory

Single value

Highest measured memory used by a task in the collection period.

peak_task_user_memory

Single value

Highest measured user memory used by a task in the collection period.

planning_time

Histogram

Binned resource waiting times for all queries in the collection time period.

queued_time

Histogram

Binned query queued times for all queries in the collection time period.

resource_waiting_time

Histogram

Binned resource waiting times for all queries in the collection time period.

scheduled_time

Histogram

Binned resource waiting times for all queries in the collection time period.

schemas

Histogram

Binned number of distinct schemas used in a query for all queries in the collection time period.

splits

Single value

Total number of splits across all queries in the collection time period.

stages

Single value

Binned number of stages for a single query, for all queries in the collection time period.

stage_max_tasks

Histogram

Binned number of tasks in any given stage for a single query, for all queries in the collection time period.

tables

Histogram

Binned number of distinct tables used in a query for all queries in the collection time period.

table_max_columns

Histogram

Binned number of columns in a single table for all tables used in a query for all queries in the collection time period.

wall_time

Histogram

Binned query wall times for all queries in the collection time period. Wall time does not include queued time.

The following is an example of a single-value metric:

{
  "name":"peak_task_total_memory",
  "unit":"byte",
  "sum":{
      "data_points":[
        {
            "start_time_unix_nano":"1635164762424772000",
            "time_unix_nano":"1635172027851773000",
            "as_int":"66609"
        }
      ],
      "aggregation_temporality":"AGGREGATION_TEMPORALITY_CUMULATIVE",
      "is_monotonic":true
  }
},

Performance data that is presented in a histogram also includes count and sum values, where the count is equal to the number of instances represented in the histogram, and the sum is the metric aggregated across all instances, such as shown in the following example, where there were 3 queries with an aggregated analysis time of 1396.0 ms:

{
  "name":"analysis_time",
  "unit":"millisecond",
  "histogram":{
      "data_points":[
        {
            "start_time_unix_nano":"1635164762424772000",
            "time_unix_nano":"1635172027851773000",
            "count":"3",
            "sum":1396.0,
            "bucket_counts":[
              "0",
              "0",
              "2",
              "1",
              "0",
              "0",
              "0",
              "0",
              "0",
              "0",
              "0"
            ],
            "explicit_bounds":[
              10.0,
              100.0,
              500.0,
              1000.0,
              2000.0,
              10000.0,
              60000.0,
              300000.0,
              3600000.0,
              86400000.0
            ]
        }
      ],
      "aggregation_temporality":"AGGREGATION_TEMPORALITY_CUMULATIVE"
  }
},