Kubernetes configuration examples#

You can find the default configuration and numerous practical examples for use cases around Configuring Starburst Enterprise in Kubernetes in the following sections:

Adding the license file#

Starburst provides customers a license file to unlock additional features of SEP. The license file needs to be provided to SEP in the cluster:

  1. Rename the file you received to starburstdata.license.

  2. Create a k8s secret that contains the license file with a name of your choice in the cluster.

    kubectl create secret generic mylicense --from-file=starburstdata.license
    
  3. Configure the secret name as the Starburst platform license.

    starburstPlatformLicense: mylicense
    

Images and repository registry credentials examples#

Defaults#

<< Return to section in k8s configuration documentation.

The following are the image- and registry-related defaults. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

image:
  repository: "harbor.starburstdata.net/starburstdata/starburst-enterprise"
  tag: "364-e.3"
  pullPolicy: "IfNotPresent"

initImage:
  repository: "harbor.starburstdata.net/starburstdata/starburst-enterprise-init"
  tag: "364.3.0"
  pullPolicy: "IfNotPresent"

registryCredentials:
  enabled: false
  # Replace this with Docker Registry that you use
  registry:
  username:
  password:

imagePullSecrets:

Controlling SEP releases#

<< Return to section in k8s configuration documentation.

With normal usage you do not need to configure the image, since the chart version updates automatically include the version update of the Docker images. In rare cases, it can be useful to update the Docker image without changing the overall chart version used. For example, you can choose to upgrade from 3xx-e.1 to a newer patch version 3xx-e.3 of SEP, which allows you to keep the rest of the chart configuration unchanged:

image:
  repository: "harbor.starburstdata.net/starburstdata"
  tag: "3xx-e.3"
  pullPolicy: "IfNotPresent"

Using private registries#

<< Return to Docker images or Docker registry access section in k8s configuration documentation.

In some organizations you need to use private registries and repositories instead of Starburst Harbor. They are often hosted on a private Harbor instance or in a repository manager. You can publish the Helm charts and Docker containers to your private setup, or use a proxying setup. Steps to set this up vary widely based on your tools, and require both Docker and Helm expertise:

  • Pull the Docker image from the Starburst Harbor registry with your credentials

  • Tag the image as desired for your internal registry

  • Push the image to your registry

  • Download the Helm charts

  • Publish the Helm charts to your Helm repository

You can use your private setup with the following steps:

The following example overrides Docker registry to use your private registry:

image:
  repository: "docker.example.com/thirdparty"
  tag: "364-e.3"
  pullPolicy: "IfNotPresent"

You also need to update your registry access configuration:

registryCredentials:
  enabled: true
  registry: docker.example.com
  username: myusername
  password: mypassword

You can also use imagePullSecrets: with private registries instead of registryCredentials.

If you changed the Docker image organization, name, or version tags, you also need to override these details in your YAML configuration files. For example, update image and initImage for SEP. Similar steps are necessary if you are using the HMS or Ranger charts.

Using imagePullSecrets#

<< Return to Docker images or Docker registry access section in k8s configuration documentation.

You can use imagePullSecrets: to authenticate with a Docker registry as an alternative to using registryCredentials:. You can pass an array list of Kubernetes secret names of type kubernetes.io/dockerconfigjson with the following format:

imagePullSecrets:
 - name: secret1
 - name: secret2

Detailed instructions for using private registries with pull secrets can be found in the Kubernetes documentation.

Internal communications configuration#

Defaults#

The following are the internal communications-related defaults in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

sharedSecret:

environment:

internalTls: false

internal:
  ports:
    http:
      port: 8080
    https:
      port: 8443

Using TLS for internal communication#

<< Return to section in k8s configuration documentation.

You can optionally enable TLS for internal communication, if the cluster is deemed insecure, or TLS is otherwise required and the performance overhead is acceptable.

Configuring automatic internal TLS#

Set internalTls to true to configure SEP to enable TLS for internal communication. Certificates are automatically generated and used. You must configure the environment: and sharedSecret: top level nodes, as well as the internal.ports.https.port: to enable this feature.

environment: production
sharedSecret: AN0Qhhw9PsZmEgEXAMPLEkIj3AJZ5/Mnyy5iRANDOMceM+SSV+APSTiSTRING
internalTls: true
internal:
  ports:
    http:
      port: 8080
    https:
      port: 8443

Manual TLS configuration#

Warning

We very strongly suggest that you use the automatic internal TLS as described in the preceding section. Manual TLS configuration is deprecated functionality. Using and configuring TLS for internal communication is very complex, requiring you to implement a certificate manager and managing certificates within the cluster.

All cluster nodes must have a fully qualified domain name (FQDN) that matches the Kubernetes naming scheme. When node.internal-address-source is set to FQDN (in both the the coordinator.additionalProperties: and worker.additionalProperties: nodes), the chart manages the node.internal-address property automatically, and the SAN field in TLS certs must match.

The TLS certificates used must have starburst, coordinator.<namespace>.svc and *.worker.<namespace>.svc in the Subject Alternative Name (SAN) field. Replace <namespace> with a real value.

coordinator:
  additionalProperties: |
    node.internal-address-source=FQDN

worker:
  additionalProperties: |
    node.internal-address-source=FQDN

Non-standard HTTPS port numbers#

<< Return to section in k8s configuration documentation.

If a non-standard port (other than 8443) is used for HTTPS, the same value must be set for both internal.ports.https.port in the YAML file and the http-server.https.port property in config.properties. This is best achieved by adding the setting to the additionalProperties for coordinator and workers as detailed in Using TLS for internal communication.

internal:
  ports:
    https:
      port: 8440
coordinator:
  additionalProperties: |
    http-server.https.port=8440
worker:
  additionalProperties: |
    http-server.https.port=8440

External communications configuration#

Defaults#

The following are the external communications-related defaults in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

expose:
  type: "clusterIp"
  clusterIp:
    name: "starburst"
    ports:
      http:
        port: 8080
  nodePort:
    name: "starburst"
    ports:
      http:
        port: 8080
        nodePort: 30080
    extraLabels: {}
  loadBalancer:
    name: "starburst"
    IP: ""
    ports:
      http:
        port: 8080
    annotations: {}
    sourceRanges: []
  ingress:
    ingressName: "coordinator-ingress"
    serviceName: "starburst"
    servicePort: 8080
    ingressClassName:
    tls:
      enabled: true
      secretName:
    host:
    path: "/"
    annotations: {}

clusterIp type#

<< Return to section in k8s configuration documentation.

expose:
  type: "clusterIp"
  clusterIp:
    name: "starburst"
    ports:
      http:
        port: 8080

nodePort type#

<< Return to section in k8s configuration documentation.

expose:
  type: "nodePort"
  nodePort:
    name: "starburst"
    ports:
      http:
        port: 8080
        nodePort: 30080

loadBalancer type#

<< Return to section in k8s configuration documentation.

expose:
  type: "loadBalancer"
  loadBalancer:
    name: "starburst"
    IP: ""
    ports:
      http:
        port: 8080
    annotations: {}
    sourceRanges: []

Basic ingress type#

<< Return to section in k8s configuration documentation.

expose:
  type: "ingress"
  ingress:
    serviceName: "starburst"
    servicePort: 8080
    tls:
      enabled: true
      secretName:
    host:
    path: "/"
    annotations: {}

ingress with nginx and cert-manager#

<< Return to section in k8s configuration documentation.

nginx is a powerful HTTP and proxy server, commonly used as load balancer. You can combine using it with cert-manager backed by Let’s Encrypt.

As a first step you need to deploy an HTTPS ingress controller for your cluster. You can follow a tutorial from the cert-manager documentation.

With the setup done, and an A record in your DNS zone ready, you can expose the Web UI:

expose:
  type: "ingress"
  ingress:
    serviceName: "starburst"
    servicePort: 8080
    tls:
      enabled: true
      secretName: "tls-secret-starburst"
    host: ""
    path: "/(.*)"
    annotations:
      kubernetes.io/ingress.class: "nginx"
      cert-manager.io/issuer: "letsencrypt-staging"

The secretName is used by the cert-manager to store the generated certificate, and can be any value.

The annotations section uses the nginx default value for the single ingress controller installation. It assumes certificate issuer with the name letsencrypt-staging is used, and needs to exist.

The Ranger user interface can be exposed in exactly the same way:

expose:
  type: "ingress"
  ingress:
    tls:
      enabled: true
      secretName: "tls-secret-ranger"
    host: ""
    path: "/(.*)"
    annotations:
      kubernetes.io/ingress.class: "nginx"
      cert-manager.io/issuer: "letsencrypt-staging"

Coordinator configuration#

Defaults#

<< Return to section in k8s configuration documentation.

The following are the coordinator-related defaults in the values.yaml file. Instead, follow best practices for creating YAML files for customizing SEP:

coordinator:
  etcFiles:
    jvm.config: |
      -server
      -XX:-UseBiasedLocking
      -XX:+UseG1GC
      -XX:G1HeapRegionSize=32M
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+ExitOnOutOfMemoryError
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:-OmitStackTraceInFastThrow
      -XX:ReservedCodeCacheSize=512M
      -XX:PerMethodRecompilationCutoff=10000
      -XX:PerBytecodeRecompilationCutoff=10000
      -Djdk.nio.maxCachedBufferSize=2000000
      -Djdk.attach.allowAttachSelf=true
    properties:
      config.properties: |
        coordinator=true
        node-scheduler.include-coordinator=false
        http-server.http.port=8080
        discovery-server.enabled=true
        discovery.uri=http://localhost:8080
      node.properties: |
        node.environment={{ include "starburst.environment" . }}
        node.data-dir=/data/starburst
        plugin.dir=/usr/lib/starburst/plugin
        node.server-log-file=/var/log/starburst/server.log
        node.launcher-log-file=/var/log/starburst/launcher.log
      log.properties: |
        # Enable verbose logging from Starburst Enterprise
        #io.trino=DEBUG
        #com.starburstdata.presto=DEBUG
      password-authenticator.properties: |
        password-authenticator.name=file
        file.password-file=/usr/lib/starburst/etc/auth/password.db
      access-control.properties:
    other: {}
  resources:
    memory: "60Gi"
    requests:
      cpu: 16
    limits:
      cpu: 16
  nodeMemoryHeadroom: "2Gi"
  heapSizePercentage: 90
  heapHeadroomPercentage: 30
  additionalProperties: ""

  envFrom: []
  nodeSelector: {}
  affinity: {}
  tolerations: []
  deploymentAnnotations: {}
  podAnnotations: {}
  priorityClassName:

JVM configuration#

<< Return to section in k8s configuration documentation.

The JVM configuration is automatically included and includes the appropriate memory settings based on the configured resources. In rare cases you might need to add or modify some parameters. In this case you need to include the full default JVM configuration and the modified values. The following example only modified G1HeapRegionSize and ReservedCodeCacheSize, but all values are required to be included.

coordinator:
  etcFiles:
    jvm.config: |
      -server
      -XX:-UseBiasedLocking
      -XX:+UseG1GC
      -XX:G1HeapRegionSize=64M
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+ExitOnOutOfMemoryError
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:-OmitStackTraceInFastThrow
      -XX:ReservedCodeCacheSize=768M
      -XX:PerMethodRecompilationCutoff=10000
      -XX:PerBytecodeRecompilationCutoff=10000
      -Djdk.nio.maxCachedBufferSize=2000000
      -Djdk.attach.allowAttachSelf=true

Adding other files in etc#

<< Return to section in k8s configuration documentation.

You can add any random other required configuration file in the etc folder, by adding a node with the desired filename in the other section, and using YAML multi-line sections to define the content. The following example adds the two files etc/resource-groups.json and etc/kafka/tpch.customer.json:

coordinator:
  etcFiles:
    other:
      resource-groups.json: |
          {
          <<json_here>
          }
      kafka/tpch.customer.json: |
          {
          <<json_here>
          }

CPU and memory allocation#

<< Return to section in k8s configuration documentation.

You have to configure the desired CPU and memory allocation in the resource node. The parameters affect the requested pod size, and are automatically used to determine the memory settings for the JVM running SEP.

coordinator:
  resources:
    memory: "256Gi"
    requests:
      cpu: 32
    limits:
      cpu: 32
  nodeMemoryHeadroom: "4Gi"

Adding properties to config.properties#

<< Return to section in k8s configuration documentation.

Additional properties to append to the default configuration file. Default values are overridden. An example usage is to set query time-out and memory values.

additionalProperties: |
  query.client.timeout=5m
  query.min-expire-age=30m
  query.max-memory-per-node=1GB
  query.max-total-memory-per-node=1GB

Using environment variables for secrets#

<< Return to section in k8s configuration documentation.

envFrom:
  - secretRef:
    name: <<secret_name>>

To use secrets for sensitive credential information to use in a catalog properties file:

1. Create a secret holding variables. You can statically create secrets using base64 encoded values of your configuration. Make sure your secret key, which is used to define the environment variable name, follows this regex pattern [a-zA-Z][a-zA-Z0-9_]* - only alphanumerics and underscore allowed. Convention is to use all caps and underscores such as PSQL_USERNAME.

$ echo -n user | base64
$ echo -n pass | base64
apiVersion: v1
kind: Secret
metadata:
  name: variables-secret
type: Opaque
data:
  PSQL_USERNAME: <base64_encoded_user>
  PSQL_PASSWORD: <base64_encoded_pass>
  1. Add the secret reference in envFrom for both coordinator and worker to make it accessible on all nodes:

envFrom:
  - secretRef:
      name: variables-secret

3. Reference variables in properties files using built-in placeholder pattern as enabled by the secrets support.

catalogs:
  postgresql: |
    connector.name=postgresql
    connection-url=jdbc:postgresql://postgresql:5432/postgres
    connection-password=${ENV:PSQL_PASSWORD}
    connection-user=${ENV:PSQL_USERNAME}

Enabling event logger#

<< Return to section in k8s configuration documentation.

Event logger is enabled and configured using the etcFiles.properties section of the coordinator configuration, used to create the required event-listener.properties file.

coordinator:
  etcFiles:
    properties:
      event-listener.properties: |
        event-listener.name=event-logger
        jdbc.url=jdbc:postgresql://<database hostname>:5432/starburst
        jdbc.user=my_psql_user
        jdbc.password=my_psql_user_password

More information about using and configuring the event logger is available in the event logger documentation.

Enabling Starburst Insights#

<< Return to section in k8s configuration documentation.

Starburst Insights provides a visual overview of important metrics about your cluster as well as a worksheets feature to write and run SQL queries. We strongly suggest reading about the options and capabilities of this tool. To use all of its capabilities, you must also enable event logger.

Insights is enabled and configured using the additionalProperties section of the coordinator configuration:

coordinator:
  additionalProperties: |
    insights.persistence-enabled=true
    insights.metrics-persistence-enabled=true
    insights.jdbc.url=jdbc:postgresql://<database hostname>:5432/starburst
    insights.jdbc.user=<user>
    insights.jdbc.password=<password>
    insights.authorized-users=<superuser>

Our Insights documentation contains a complete description of all Insights configuration properties.

Enabling access control#

<< Return to section in k8s configuration documentation.

SEP has several options for implementing access control. You can add any desired content of the properties file inside the YAML file. For example, you can use the read-only System access control:

coordinator:
  etcFiles:
    properties:
      access-control.properties: |
        access-control.name=read-only

To implement Ranger, use the Apache Ranger Helm chart.

Other access control choices, such as file-based user access control require configurations in one or more nodes in the SEP Helm chart.

SEP also supports Privacera. Please refer to your Privacera user documentation to learn how to connect that service to your k8s cluster.

Also see: file-based user authentication example.

Worker configuration#

Defaults#

<< Return to section in k8s configuration documentation.

The following are the worker-related defaults in the values.yaml file. Instead, follow best practices for creating YAML files for customizing SEP:

worker:
  etcFiles:
    jvm.config: |
      -server
      -XX:-UseBiasedLocking
      -XX:+UseG1GC
      -XX:G1HeapRegionSize=32M
      -XX:+ExplicitGCInvokesConcurrent
      -XX:+ExitOnOutOfMemoryError
      -XX:+HeapDumpOnOutOfMemoryError
      -XX:-OmitStackTraceInFastThrow
      -XX:ReservedCodeCacheSize=512M
      -XX:PerMethodRecompilationCutoff=10000
      -XX:PerBytecodeRecompilationCutoff=10000
      -Djdk.nio.maxCachedBufferSize=2000000
      -Djdk.attach.allowAttachSelf=true
    properties:
      config.properties: |
        coordinator=false
        http-server.http.port=8080
        discovery.uri=http://{{ include "starburst.service.name" . }}:8080
      node.properties: |
        node.environment={{ include "starburst.environment" . }}
        node.data-dir=/data/starburst
        plugin.dir=/usr/lib/starburst/plugin
        node.server-log-file=/var/log/starburst/server.log
        node.launcher-log-file=/var/log/starburst/launcher.log
      log.properties: |
        # Enable verbose logging from Starburst Enterprise
        #io.trino=DEBUG
        #com.starburstdata.presto=DEBUG
    other: {}
  replicas: 2
  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 100
    targetCPUUtilizationPercentage: 80
  deploymentTerminationGracePeriodSeconds: 300 # 5 minutes
  starburstWorkerShutdownGracePeriodSeconds: 120 # 2 minutes
  resources:
    memory: "100Gi"
    requests:
      cpu: 16
    limits:
      cpu: 16
  nodeMemoryHeadroom: "2Gi"
  heapSizePercentage: 90
  heapHeadroomPercentage: 30
  additionalProperties: ""
  envFrom: []
  nodeSelector: {}
  affinity: {}
  tolerations: []
  deploymentAnnotations: {}
  podAnnotations: {}
  priorityClassName:

Startup script#

Defaults#

The following are the startup script-related defaults in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

initFile: ""
extraArguments: []

Using initFile#

<< Return to section in k8s configuration documentation.

initFile:
extraArguments:
extraSecret:
  name:
  file:

The following example shows how you can use initFile to run a custom init script on the coordinator and workers:

initFile: |
  #!/bin/bash
  echo "Custom init for $1 $2"
  exec /usr/lib/starburst/bin/run-starburst
extraArguments:
  - TEST_ARG

Output on the coordinator:

Custom init for coordinator TEST_ARG
<<starburst_logs>>

Output on a worker:

Custom init for worker TEST_ARG
<<starburst_logs>>

Use initFile: to retrieve and load drivers and other large binaries with curl at startup:

initFile: |
  #!/bin/bash
  echo "Custom init for $1 $2"
  curl https://gdaadmins.blob.core.windows.net/prestoteradatadriver/terajdbc4.jar -o /usr/lib/presto/plugin/teradata/terajdbc4.jar
  exec /usr/lib/starburst/bin/run-starburst
extraArguments:
  - TEST_ARG

Security considerations#

Defaults#

The following are the security-related defaults in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

externalSecrets:
  enabled: false
  type: goDaddy
  secretPrefix: external/
  goDaddy:
    backendType: secretsManager

userDatabase:
  enabled: false
  users:
    - username: admin
      password: thepassword

securityContext: {}

extraSecret:
  name:
  file:

External secret reference#

<< Return to section in k8s configuration documentation.

To configure SEP to work with LDAP as an external secret reference, first create a k8s secret holding the file:

$ kubectl create secret generic ldap-ca --from-file=ca.crt

When the file is created, you can configure the secret reference usage for the above configuration as:

coordinator:
  etcFiles:
    properties:
      password-authenticator.properties: |
        ldap.url=ldaps://ldap-server:636
        ldap.user-bind-pattern=uid=${USER},OU=America,DC=corp,DC=example,DC=com
        ldap.ssl-trust-certificate=secretRef:ldap-ca:ca.crt

This mounts the secret named ldap-ca in the path /mnt/secretsRef/ldap-ca and replaces secretRef:ldap-ca occurrences into the absolute path, resulting in the following configuration property setting:

ldap.ssl-trust-certificate=/mnt/secretRef/ldap-ca/ca.crt

Note

Specific secret values, such as passwords, can be passed into properties files using the envFrom parameters available for coordinator and worker.

Defining external secrets#

<< Return to section in k8s configuration documentation.

You can automatically mount external secrets, for example from the AWS Secrets Manager, using the secretRef or secretEnv notation.

externalSecrets:
  enabled: true # disabled by default
  type: goDaddy
  secretPrefix: <<secret_name_prefix>>
  goDaddy:
    backendType: <<string>>

An example of this configuration:

  1. Create AWS Secrets Manager secret:

$ aws secretsmanager create-secret --name external.starburst-http-server-port --secret-string 8888
  1. Reference it from your configuration section in config.properties:

coordinator:
  etcFiles:
      config.properties: |
      http-server.http.port=secretEnv:external/starburst-http-server-port
  1. Configure the external secrets:

externalSecrets:
  enabled: true
  type: goDaddy
  secretPrefix: external/
  goDaddy:
    backendType: secretsManager

This creates the following external secret manifest:

apiVersion: kubernetes-client.io/v1
kind: ExternalSecret
metadata:
  name: external.starburst-http-server-port
spec:
  backendType: secretsManager
  data:
    - key: external/starburst-http-server-port
      name: EXTERNAL_STARBURST_HTTP_SERVER_PORT

Additionally, the external secrets provider fetches secrets from AWS and creates a k8s secret:

apiVersion: v1
kind: Secret
metadata:
  name: external.starburst-http-server-port
type: Opaque
data:
  EXTERNAL_STARBURST_HTTP_SERVER_PORT: 8888

The k8s secret is now bound to the container as the EXTERNAL_STARBURST_HTTP_SERVER_PORT environment variable. SEP config.properties is resolved to:

http-server.http.port=${ENV:EXTERNAL_STARBURST_HTTP_SERVER_PORT}

If you have a secret with multiple values, such as a JSON-formatted secret, you can reference the secret values independently.

For example, you may have a secret named external-starburst-creds-mysql that is structured like this in the AWS Secrets Manager:

{
  "MYSQL_USER": "user",
  "MYSQL_PASSWORD": "password"
}

The MYSQL_USER and MYSQL_PASSWORD keys can be referenced in the values.yaml file:

externalSecrets:
  enabled: true
  type: goDaddy
  secretPrefix: external/
  goDaddy:
    backendType: secretsManager

catalogs:
   mysqldb: |-
    connector.name=mysql
    connection-url=jdbc:mysql://<<dns>>:3306
    connection-user=secretEnv:external-starburst-creds-mysql:MYSQL_USER
    connection-password=secretEnv:external-starburst-creds-mysql:MYSQL_PASSWORD

File-based authentication#

<< Return to section in k8s configuration documentation.

Using the htpasswd-generated file:

userDatabase:
  enabled: true
  name: password.db
  users:
    - username: admin
      password: thepassword

Using an externally-created user database:

userDatabase:
  enabled: false

RBAC-enabled clusters#

<< Return to section in k8s requirements documentation.

In the following example, a user steve is configured to work with SEP in a namespace called dev-sandbox:

  1. Create RoleBinding steve-edit to bind the edit ClusterRole to the user in the specific namespace:

kubectl create rolebinding steve-edit \
  --clusterrole edit \
  --user steve \
  --namespace dev-sandbox
  1. If externalSecrets: are in use, then a role with the following permissions must be additionally bound to the user:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: external-secrets-edit
rules:
- apiGroups:
  - kubernetes-client.io
  resources:
  - externalsecrets
  - externalsecrets/status
  verbs:
    - create
    - get
    - list
    - watch
    - update
    - patch
    - delete
kubectl create rolebinding steve-external-secrets-edit \
  --clusterrole external-secrets-edit \
  --user steve \
  --namespace dev-sandbox

The Starburst Enterprise Helm chart does not provide functionality to use a service account for deployments. None of the containers deployed to the cluster needs access to the Kubernetes API.

Performance considerations#

Concurrent query defaults#

The following is the query: default in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

query:
  maxConcurrentQueries: 3

Concurrent query increase example#

<< Return to section in k8s configuration documentation.

query:
  maxConcurrentQueries: 5

Spilling defaults#

The following are the spilling-related defaults in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

spilling:
  enabled: false
  volume:
    emptyDir: {}

Spilling example#

<< Return to section in k8s configuration documentation.

spilling:
  enabled: true:
  volume:
    emptyDir: {}

Hive connector storage caching defaults#

The following are the storage caching-related defaults in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

cache:
  enabled: false
  diskUsagePercentage: 80
  ttl: "7d"
  volume:
    emptyDir: {}

Hive connector storage caching example#

<< Return to section in k8s configuration documentation.

cache:
  enabled: true
  diskUsagePercentage: 75
  ttl: "5d"

Catalogs#

Default#

The following is the default catalog entry in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

catalogs:
  tpch: |-
    connector.name=tpch
    tpch.splits-per-node=4

Catalog examples#

<< Return to section in k8s configuration documentation.

The following snippet adds the tpcds-testdata catalog. It uses the TPCDS connector and only specifies the connector name.

catalogs:
  tpcds-testdata: |
  connector.name=tpcds

Multiple catalogs are configured one after the other:

catalogs:
  tpch-testdata: |
    connector.name=tpch
  tpcds-testdata: |
    connector.name=tpcds
  tmpmemory: |
    connector.name=memory
  metrics: |
    connector.name=jmx
  devnull: |
    connector.name=blackhole
  datalake: |
    connector.name=hive
    hive.metastore.uri=thrift://hive:9083
  s3: |
    connector.name=hive
    hive.metastore=glue

The name of each catalog is defined by the chosen name for the node within catalogs. The above examples results in catalog names such as tpch-testdata, tpcds-testdata, tmpmemory, metrics and others. These names are visible for CLI and other tool users with SHOW CATALOGS and potentially in the user interface.

Each catalog properties file can use the configuration options supported by the connector designated by the configured connector.name.

Teradata Direct connector#

<< Return to section in k8s configuration documentation.

The Starburst Teradata Direct connector is supported for Kubernetes deployments in AWS EKS and in Azure AKS. Follow the detailed instructions to configure the necessary networking and components.

Warning

The configuration to use the Starburst Teradata Direct connector on Kubernetes is complex. You need significant Kubernetes and networking knowledge. Contact our Starburst Support team for assistance.

Additional volumes#

Default#

The following is the additionalVolumes: default in the values.yaml file. Do not place unchanged values in customization files:

additionalVolumes: []

Adding volumes examples#

<< Return to section in k8s configuration documentation.

additionalVolumes:
  - path: /mnt/InContainer
    volume:
      emptyDir: {}
  - path: /var/lib/starburst/cache1
    volume:
      hostPath:
        path: /media/nv1/starburst-cache
  - path: /var/lib/starburst/cache2
    volume:
      hostPath:
        path: /media/nv2/starburst-cache

Adding files examples#

<< Return to section in k8s configuration documentation.

As an example, if you want to copy a file to an already existing location like /usr/lib/starburst/plugin you can mount the file to a Kubernetes volume like a ConfigMap, and add the file as a subPath to the path:

additionalVolumes:
- path: /usr/lib/starburst/plugin/x.jar
    subPath: x.jar
    volume:
    configMap:
        name: "configmap-in-volume"

In this case, the key named x.jar from the ConfigMap is mounted as that file in the location provided in path.

Large binaries, such as drivers, are added at cluster start time in the initFile: top level node.

Prometheus#

Default#

<< Return to section in k8s configuration documentation.

The following are the prometheus: defaults in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

prometheus:
  enabled: true
  agent:
    version: "0.15.0"
    port: 8081
    config: "/etc/starburst/telemetry/prometheus.yaml"
  rules:
    - pattern: trino.execution<name=QueryManager><>(running_queries|queued_queries)
      name: $1
      attrNameSnakeCase: true
      type: GAUGE
    - pattern: 'trino.execution<name=QueryManager><>FailedQueries\.TotalCount'
      name: 'failed_queries'
      type: COUNTER

Kubernetes management and monitoring#

Default#

The following are the Kubernetes-related defaults in the values.yaml file. Do not place unchanged values in customization files. Instead, follow best practices for creating YAML files for customizing SEP:

readinessProbe:
  exec:
    command:
      - /bin/sh
      - -c
      - curl --max-time 5 -s http://localhost:8080/v1/info | grep \"starting\":false
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 30

livenessProbe:
  exec:
    command:
      - /bin/sh
      - -c
      - curl --max-time 5 -s http://localhost:8080/v1/info | grep \"starting\":false
  initialDelaySeconds: 300
  periodSeconds: 300
  timeoutSeconds: 30
  failureThreshold: 1

commonLabels: {}

<< Return to section in k8s configuration documentation.

registry-access.yaml#

You can get started with a minimal file that only adds your credentials to the Starburst Harbor instance:

registryCredentials:
  enabled: true
  registry: harbor.starburstdata.net/starburstdata
  username: <yourusername>
  password: <yourpassword>

sep-prod-catalogs.yaml#

A catalog YAML file adds all the configurations for defining the catalogs and their connection details to the underlying data sources. The following snippet contains a few completely configured catalogs that are ready to use:

  • tpch-testdata exposes the TPC-H benchmark data useful for learning SQL and testing.

  • tmpmemory uses the Memory connector to provide a small temporary test ground for users.

  • metrics uses the JMX connector and exposes the internal metrics of SEP for monitoring and troubleshooting.

  • clientdb uses the Starburst PostgreSQL connector to access the clientdb database.

  • datalake and s3 are stubs of catalogs using the Starburst Hive connector with a HMS and a Glue catalog as metastore.

catalogs:
  tpch-testdata: |
    connector.name=tpch
  tmpmemory: |
    connector.name=memory
  metrics: |
    connector.name=jmx
  clientdb: |
    connector.name=postgresql
    connection-url=jdbc:postgresql://postgresql:5432/clientdb
    connection-password=${ENV:PSQL_PASSWORD}
    connection-user=${ENV:PSQL_USERNAME}
  datalake: |
    connector.name=hive
    hive.metastore.uri=thrift://hive:9083
  s3: |
    connector.name=hive
    hive.metastore=glue

sep-prod-setup.yaml#

This example provides a minimal starting point as a best practice. It achieves the following:

  • environment: provides the name MyProductionCluster for the environment, which becomes visible in the Web UI.

  • sharedSecret: sets a shared random secret string for communications between the coordinator and all workers. NOTE: This is different than the shared secret set for the license file with the kubectl create secret command.

  • replicas: configures the cluster to use four workers.

  • resources: adjusts the memory and CPU requirements for the workers and the coordinator. In this example, it increases the values for use with more powerful servers than the default.

environment: production
sharedSecret: AN0Qhhw9PsZmEgEXAMPLEkIj3AJZ5/Mnyy5iRANDOMceM+SSV+APSTiSTRING

coordinator:
  resources:
    memory: "256Gi"
    requests:
      cpu: 32
    limits:
      cpu: 32

worker:
  replicas: 4
  resources:
    memory: "256Gi"
    requests:
      cpu: 32
    limits:
      cpu: 32

Warning

The values for memory and CPU resources must reflect your cluster’s available resources. If you attempt to run SEP with the defaults and there are no pods available with those resources, SEP will not start.