Telemetry#

Starburst Enterprise platform (SEP) collects the following data about product performance and usage. This information is sent to Starburst, and informs our development efforts. Starburst can share the data with you on a per-request basis:

The collected data is sent to the following endpoints in a Protobuf compressed, binary format:

  • https://telemetry.eng.starburstdata.net/v1/metrics

  • https://telemetry.eng.starburstdata.net/v1/logs

All connections use TLS, and are therefore secured and data is end-to-end encrypted.

Configuration#

Telemetry is enabled by default, with the exclusion of query logging. You must opt-in to query logging by configuring it. All configuration properties listed here are set in the Config properties file.

Telemetry configuration properties#

Property name

Description

telemetry.enabled

Set to false to completely disable the telemetry module. Defaults to true.

telemetry.metrics-export-enabled

Set to false to disable collecting and exporting metrics. Defaults to true.

telemetry.metrics-export-interval

Frequency at which metrics are sent to Starburst. Defaults to 1h, sending approximately 100 KB of data each time.

telemetry.log-created

Set to true to log queries when they are created. Defaults to false.

telemetry.log-completed

Set to true to log queries when they are completed. Defaults to false.

telemetry.log-split

Set to true to log splits from query processing. Defaults to false.

telemetry.log-query-types

A comma-separated list of query types to be logged. Possible query types are SELECT, EXPLAIN, DESCRIBE, INSERT, UPDATE, DELETE, ANALYZE, DATA_DEFINITION, and ALTER_TABLE_EXECUTE.

telemetry.log-query-plan

Set to ORIGINAL to log the full query plan. Defaults to DISABLED.

telemetry.log-query-text

Set to ORIGINAL to log the full query text. Defaults to DISABLED.

telemetry.log-query-statistics

Set to ORIGINAL to log the full query statistics. Defaults to DISABLED.

telemetry.log-query-failure-info

Set to ORIGINAL to log the full query failure info. Defaults to DISABLED.

telemetry.log-query-warnings

Set to ORIGINAL to log full query warnings. Defaults to DISABLED.

telemetry.config-export-enabled

Set to false to disable collecting and exporting configuration. Defaults to true.

telemetry.logs-export-interval

Duration to wait and batch more logs, before sending them to Starburst. Defaults to 2s.

telemetry.logs-batch-size

Maximum number of logs entries collected, before sending them to Starburst. Defaults to 1000.

telemetry.logs-batch-size

Maximum size of the logs batch, before sending it to Starburst. Defaults to 1MB.

All the data collected is annotated with the environmental data described in the next section.

Enable query logging#

SEP can collect non-anonymized query logs, which may contain sensitive information, such as may be found in WHERE and other predicate clauses. It is an opt-in feature that must be specifically configured.

Caution

This feature is provided to ensure that data affecting user experience and performance are captured. If enabled, we strongly recommend limiting its use to a staging environment with limited access for purposes of testing the effect on performance.

To configure SEP to collect the query text, plan, and failure info of all completed SQL queries, use the following configuration:

telemetry.log-completed=true
telemetry.log-query-types=select
telemetry.log-query-plan=ORIGINAL
telemetry.log-query-text=ORIGINAL
telemetry.log-query-failure-info=ORIGINAL

Collected data#

Metrics, with the exception of environmental and configuration log data, are based on completed queries, whether successful or not. Examples of this data are provided in the following sections.

Environmental data#

The environmental data describes the ownership, licensing and service information of every SEP cluster.

SEP environment information#

Key

Value

deployment.environment

Environment name defined in SEP.

license.hash

The hash of the license file, if present.

license.owner

Defined by owner in the SEP license file, or from the account owner of the instance if deployed through a marketplace.

license.type

JSON for Kubernetes or manual deployments, or a string indicating the marketplace it was deployed through, such as AWS.

service.instance.id

A random UUID generated each time SEP starts.

service.name

Always set to starburst-enterprise.

service.version

The SEP version number of the cluster.

telemetry.sdk.language

Always set to java.

telemetry.sdk.name

Always set to opentelemetry.

telemetry.sdk.version

The version of the OpenTelemetry library used.

service.start_time

The ISO8601 date and time when the coordinator last started.

The following is an example of the data collected that describes the SEP environment:

"resource":{
    "attributes":[
      {
          "key":"deployment.environment",
          "value":{
            "string_value":"prod"
          }
      },
      {
          "key":"license.hash",
          "value":{
            "string_value":"5000eRAND0M967d0004a4eLICENSEa97b00006023dedeSTRING82460c8500055"
          }
      },
      {
          "key":"license.owner",
          "value":{
            "string_value":"Example Company"
          }
      },
      {
          "key":"license.type",
          "value":{
            "string_value":"JSON"
          }
      },
      {
          "key":"service.instance.id",
          "value":{
            "string_value":"6d35zzzz-2000-4628-zzzz-120000zzzzed"
          }
      },
      {
          "key":"service.name",
          "value":{
            "string_value":"starburst-enterprise"
          }
      },
      {
          "key":"service.version",
          "value":{
            "string_value":"prod"
          }
      },
      {
          "key":"telemetry.sdk.language",
          "value":{
            "string_value":"java"
          }
      },
      {
          "key":"telemetry.sdk.name",
          "value":{
            "string_value":"opentelemetry"
          }
      },
      {
          "key":"telemetry.sdk.version",
          "value":{
            "string_value":"1.6.0"
          }
      }
    ]
}

Configuration log data#

SEP collects configuration property names, and a representation of the value. Boolean values are recorded as-is. Binary values are rounded to the nearest base two magnitude; for instance, 72 GB is recorded as 64 GB. Other numeric values, such as INTEGER and DOUBLE are rounded down to the nearest order of magnitude; for instance, 54,321 is rounded to 100,000. Text values are not recorded, only the fact that they are set.

The following JSON snippet is an example of the data collected that describes SEP configuration properties:

"logs": [
  {
    "time_unix_nano": "1637193575209705000",
    "severity_number": "SEVERITY_NUMBER_INFO",
    "name": "bootstrap",
    "body": {
      "string_value": ""
    },
    "attributes": [
      {
        "key": "propertyName",
        "value": {
          "string_value": "cache-service.cache-ttl"
        }
      },
      {
        "key": "propertyValue",
        "value": {
          "string_value": "0.00ns"
        }
      }
    ]
  },
  {
    "time_unix_nano": "1637193575223536000",
    "severity_number": "SEVERITY_NUMBER_INFO",
    "name": "bootstrap",
    "body": {
      "string_value": ""
    },
    "attributes": [
      {
        "key": "propertyName",
        "value": {
          "string_value": "cache-service.uri"
        }
      }
    ]
  },
  {
    "time_unix_nano": "1637193575224097000",
    "severity_number": "SEVERITY_NUMBER_INFO",
    "name": "bootstrap",
    "body": {
      "string_value": ""
    },
    "attributes": [
      {
        "key": "propertyName",
        "value": {
          "string_value": "materialized-views.namespace"
        }
      }
    ]
  },
]

Metrics#

All metrics collected by SEP are aggregated for the time period starting at start_time_unix_nano and ending at time_unix_nano. These timestamps are repeated with the same value with most metrics.

Metrics are based on completed queries, whether successful or not. Examples of this data are provided below.

queries_executed#

SEP collects aggregated counts of specific query dimensions as described in the following table.

Query execution count dimensions#

Dimension

Description

columnType

Total queries per column type, across all sources.

connector

Total queries per connector.

connector, queryType

Total queries by connector and query type. Possible query types are SELECT, EXPLAIN, DESCRIBE, INSERT, UPDATE, DELETE, ANALYZE, DATA_DEFINITION, and ALTER_TABLE_EXECUTE.

function

Total queries by named function or UDF.

sessionProperty, value

Total queries using named session property or catalog session property, and a representation of the value. Boolean values are recorded as-is. Binary values are rounded to the nearest base 2 magnitude; for instance, 72 GB is recorded as 64 GB. Other numeric values are rounded down to the nearest order of magnitude; for instance, 54,321 is rounded to 100,000. Text values are not recorded, only the fact that they were set.

source

Total queries per named client, as supplied by client, such as “trino-cli”.

The following is an example of the collected dimensional query execution data:

"name":"queries_executed",
"unit":"1",
"sum":{
    "data_points":[
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"3",
          "attributes":[
            {
                "key":"source",
                "value":{
                  "string_value":"trino-cli"
                }
            }
          ]
      },
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"1",
          "attributes":[
            {
                "key":"function",
                "value":{
                  "string_value":"max"
                }
            }
          ]
      },
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"2",
          "attributes":[
            {
                "key":"connector",
                "value":{
                  "string_value":"postgresql"
                }
            },
            {
                "key":"queryType",
                "value":{
                  "string_value":"SELECT"
                }
            }
          ]
      },

queries_failed#

SEP collects aggregated counts of query failures.

Query failure count dimensions#

Dimension

Description

errorCode, failureType

Total failed queries by failure type. Error codes are numeric code values. FailureTypes are exception class names such as io.trino.plugin.hive.ViewAlreadyExistsException or a generic io.trino.spi.TrinoException but also java.lang.NullPointerException.

The following is an example of the collected data:

"name":"queries_failed",
"unit":"1",
"sum":{
    "data_points":[
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"3",
          "attributes":[
            {
                "key":"error_code",
                "value":{
                  "int_value":"400"
                }
            },
            {
                "key":"failure_type",
                "value":{
                  "string_value":"Can't create database 'foo'; database exists"
                }
            }
          ]
      },
}

physical_input_bytes#

SEP collects the aggregated byte count of data in all processed queries.

Physical input bytes dimension counts#

Dimension

Description

connector

Total input bytes by connector.

The following is an example of the collected data:

"name":"physical_input_bytes",
"unit":"byte",
"sum":{
    "data_points":[
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"300",
          "attributes":[
            {
                "key":"connector",
                "value":{
                  "string_value":"postgresql"
                }
            }
          ]
      },
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"300"
          ]
      }
    ]
}

physical_input_rows#

SEP collects the aggregated count of input rows of data in all processed queries.

Physical input rows dimension counts#

Dimension

Description

connector

Total input rows by connector

The following is an example of the collected data:

"name":"physical_input_rows",
"unit":"1",
"sum":{
    "data_points":[
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"300",
          "attributes":[
            {
                "key":"connector",
                "value":{
                  "string_value":"postgresql"
                }
            }
          ]
      },
      {
          "start_time_unix_nano":"1635164762424772000",
          "time_unix_nano":"1635172027851773000",
          "as_int":"300"
          ]
      }
    ]
}

Query performance and complexity metrics#

SEP collects aggregations of key performance and complexity measures of the queries it processes.

Query performance and complexity metrics#

Metric

Data type

Description

analysis_time

Histogram

Binned query analysis times for all queries in the collection time period.

catalogs

Histogram

Binned number of distinct catalogs used in a query for all queries in the collection time period.

connectors

Histogram

Binned number of distinct connectors used in a query for all queries in the collection time period.

cpu_time

Histogram

Binned total CPU time spent processing a query, for all queries in the collection time period.

cumulative_memory

Single value

Binned cumulative memory for a single query throughout its processing, for all queries in the collection time period. This is different from peak memory; not all of the cumulative memory may have been in use at the same time.

cumulative_system_memory

Single value

Cumulative memory used by queries in the collection period.

execution_time

Histogram

Binned query execution times for all queries in the collection time period.

input_columns

Histogram

Binned number of input columns used in a query for all queries in the collection time period.

output_columns

Histogram

Binned number of output columns resulting from a query for all queries in the collection time period.

peak_task_total_memory

Single value

Highest measured memory used by a task in the collection period.

peak_task_user_memory

Single value

Highest measured user memory used by a task in the collection period.

planning_time

Histogram

Binned resource waiting times for all queries in the collection time period.

queued_time

Histogram

Binned query queued times for all queries in the collection time period.

resource_waiting_time

Histogram

Binned resource waiting times for all queries in the collection time period.

scheduled_time

Histogram

Binned scheduled times for all queries in the collection time period.

schemas

Histogram

Binned number of distinct schemas used in a query for all queries in the collection time period.

splits

Single value

Total number of splits across all queries in the collection time period.

stages

Single value

Binned number of stages for a single query, for all queries in the collection time period.

stage_max_tasks

Histogram

Binned number of tasks in any given stage for a single query, for all queries in the collection time period.

tables

Histogram

Binned number of distinct tables used in a query for all queries in the collection time period.

table_max_columns

Histogram

Binned number of columns in a single table for all tables used in a query for all queries in the collection time period.

wall_time

Histogram

Binned query wall times for all queries in the collection time period. Wall time does not include queued time.

The following is an example of a single-value metric:

{
  "name":"peak_task_total_memory",
  "unit":"byte",
  "sum":{
      "data_points":[
        {
            "start_time_unix_nano":"1635164762424772000",
            "time_unix_nano":"1635172027851773000",
            "as_int":"66609"
        }
      ],
      "aggregation_temporality":"AGGREGATION_TEMPORALITY_CUMULATIVE",
      "is_monotonic":true
  }
},

Performance data that is presented in a histogram also includes count and sum values, where the count is equal to the number of instances represented in the histogram, and the sum is the metric aggregated across all instances, such as shown in the following example, where there were three queries with an aggregated analysis time of 1396.0 ms:

{
  "name":"analysis_time",
  "unit":"millisecond",
  "histogram":{
      "data_points":[
        {
            "start_time_unix_nano":"1635164762424772000",
            "time_unix_nano":"1635172027851773000",
            "count":"3",
            "sum":1396.0,
            "bucket_counts":[
              "0",
              "0",
              "2",
              "1",
              "0",
              "0",
              "0",
              "0",
              "0",
              "0",
              "0"
            ],
            "explicit_bounds":[
              10.0,
              100.0,
              500.0,
              1000.0,
              2000.0,
              10000.0,
              60000.0,
              300000.0,
              3600000.0,
              86400000.0
            ]
        }
      ],
      "aggregation_temporality":"AGGREGATION_TEMPORALITY_CUMULATIVE"
  }
},

Optional query log data#

If query log collection is enabled, each query processed results in one or more associated log entries. The following is an example of a query log entry:

"logs": [
  {
    "time_unix_nano": "1635515535751000000",
    "severity_number": "SEVERITY_NUMBER_INFO",
    "name": "queryCompletedEvent",
    "body": {
      "string_value": ""
    },
    "attributes": [
      {
        "key": "createTime",
        "value": {
          "string_value": "2021-10-29T13:52:13.288Z"
        }
      },
      {
        "key": "endTime",
        "value": {
          "string_value": "2021-10-29T13:52:15.654Z"
        }
      },
      {
        "key": "executionStartTime",
        "value": {
          "string_value": "2021-10-29T13:52:13.501Z"
        }
      },
      {
        "key": "failureInfo",
        "value": {
          "string_value": "null"
        }
      },
      {
        "key": "metadata.plan",
        "value": {
          "string_value": "Fragment 0 [SINGLE]\n    CPU: 18.33ms, Scheduled: 24.11ms, Input: 598 rows (65.56kB); per task: avg.: 598.00 std.dev.: 0.00, Output: 598 rows (57.21kB)\n    Output layout: [field, field_0, field_1, field_2, field_3, field_4]\n    ..."
        }
      },
      {
        "key": "metadata.query",
        "value": {
          "string_value": "SHOW FUNCTIONS"
        }
      },
      {
        "key": "statistics",
        "value": {
          "string_value": "{\"cpuTime\":0.097000000,...}"
        }
      },
      {
        "key": "warnings",
        "value": {
          "string_value": "[]"
        }
      }
    ]
  }
]