Installation#

Install Starburst Gateway with Helm.

Use the following instructions to configure and deploy Starburst Gateway.

Configuration#

The following sections describe the things you need for your configuration.

Each component of the Starburst Gateway has a corresponding node in the configuration YAML file.

Backend database#

Starburst Gateway requires a MySQL, PostgreSQL, or Oracle database. Database initialization is performed automatically when the Starburst Gateway process starts.

Starburst clusters#

Starburst Galaxy and the current and previous Starburst Enterprise LTS versions are compatible with Starburst Gateway.

Note

Starburst Gateway is not compatible with Trino or Amazon Athena.

SEP Configuration#

From a users perspective Starburst Gateway acts as a transparent proxy for one or more clusters. The following configuration tips should be taken into account for all clusters behind the Starburst Gateway.

If all client and server communication is routed through Starburst Gateway, then process forwarded HTTP headers must be enabled:

http-server.process-forwarded=true

Without this setting, first requests go from the user to Starburst Gateway and then to SEP correctly. However, the URL for subsequent next URIs for more results in a query provided by SEP is then using the local URL of the cluster, and not the URL of the Starburst Gateway. This circumvents the Starburst Gateway for all these requests. In scenarios, where the local URL of the SEP cluster is private to the cluster on the network level, these following calls do not work at all for users.

This setting is also required for SEP to authenticate in the case TLS is terminated at the Starburst Gateway. Normally it refuses to authenticate plain HTTP requests, but if http-server.process-forwarded=true it authenticates over HTTP if the request includes X-Forwarded-Proto: HTTPS.

Starburst Galaxy requires setting

routing:
  addXForwardedHeaders: false

to prevent Starburst Gateway from sending X-Forwarded-* headers. This setting is required to be true when proxying clusters that are behind firewalls or otherwise not directly routable from the SQL client.

Secrets in configuration file#

Environment variables can be used as values in the configuration file. You can manually set an environment variable on the command line.

export DB_PASSWORD=my-super-secret-pwd

To use this variable in the configuration file, you reference it with the syntax ${ENV:VARIABLE}. For example:

dataStore:
  jdbcUrl: jdbc:postgresql://localhost:5432/gateway
  user: postgres
  password: ${ENV:DB_PASSWORD}

Configure routing rules#

Find more information in the routing rules documentation.

Configure logging #

To configure the logging level for various classes, specify the path to the log.properties file by setting log.levels-file in serverConfig.

For additional configurations, use the log.* properties from the logging properties documentation and specify the properties in serverConfig.

Configure additional v1/statement-like paths#

The SEP client protocol specifies that queries are initiated by a POST to v1/statement. The Starburst Gateway incorporates this into its routing logic by extracting and recording the query id from responses to such requests. If you use an experimental or commercial build of SEP that supports additional endpoints, you can cause Starburst Gateway to treat them equivalently to /v1/statement by adding them under the additionalStatementPaths configuration node. They must be absolute, and no path can be a prefix to any other path. The standard /v1/statement path is always included and does not need to be configured. For example:

additionalStatementPaths:
  - '/ui/api/insights/ide/statement'

Configure behind a load balancer#

A possible deployment of Starburst Gateway is to run multiple instances of Starburst Gateway behind another generic load balancer, such as a load balancer from your cloud hosting provider. In this deployment you must configure the serverConfig to include enabling process forwarded HTTP headers:

serverConfig:
  http-server.process-forwarded: true

Configure larger proxy response size#

Starburst Gateway reads the response from SEP in bytes (up to 32MB by default). It can be configured by setting:

proxyResponseConfiguration:
  responseSize: 50MB

Deploy Starburst Gateway#

Use the following steps to deploy Starburst Gateway on Kubernetes with Helm.

The Starburst Gateway Helm chart includes:

  • A config node for general configuration

  • Standard Helm options such as replicaCount, resources, and ingress

Access the Helm Chart#

The Starburst Gateway Helm chart is available in the Starburst Helm chart project.

Use the following commands to access the chart:

helm registry login harbor.starburstdata.net/starburstdata
# Enter your credentials
helm pull oci://harbor.starburstdata.net/starburstdata/charts/starburst-gateway --version 472.0.0

Create a secret#

Use the following command to create a secret for Harbor registry authentication:

kubectl create secret docker-registry harbor-auth \
  --docker-server=harbor.starburstdata.net \
  --docker-username=<your-username> \
  --docker-password=<your-password> \
  --docker-email=<your-email>

Deploy a PostgreSQL database#

Use the following command to create a PostgreSQL pod:

kubectl apply -f val-posgres.yaml

Install Starburst Gateway#

Use the following command to install Starburst Gateway:

helm upgrade --install starburst-gateway starburst-gateway-472.0.0.tgz -f gateway-config.yaml

Access the Starburst Gateway UI#

User the following command to access the Starburst Gateway UI:

kubectl port-forward service/starburst-gateway 8080:8080

Then, access the UI at http://localhost:8080.

Additional options#

To implement static routing rules, create a ConfigMap from your routing rules yaml definition:

kubectl create cm routing-rules --from-file your-routing-rules.yaml

Then mount it to your container:

volumes:
    - name: routing-rules
      configMap:
          name: routing-rules
          items:
              name: your-routing-rules.yaml
              path: your-routing-rules.yaml

volumeMounts:
    - name: routing-rules
      mountPath: "/etc/routing-rules/your-routing-rules.yaml"
      subPath: your-routing-rules.yaml

Ensure that the mountPath matches the rulesConfigPath specified in your configuration. Note that the subPath is not strictly necessary, and if it is not specified the file is mounted at mountPath/<configMap key>.

Standard Helm options such as replicaCount, image, imagePullSecrets, service, ingress and resources are supported. These are defined in helm/values.yaml.

Health checks#

Starburst Gateway periodically performs health checks and maintains an in-memory health status for each backend. If a backend fails a health check, it is marked as UNHEALTHY, and Starburst Gateway stops routing requests to it.

It is important to distinguish health status from the active/inactive state of a backend. The active/inactive state indicates whether a backend is manually turned on or off, whereas health status is programmatically determined by the health check process. Health checks are only performed on backends that are marked as active.

Starburst recommends using either INFO_API or METRICS for your health check. Other options may be deprecated in the future.

See health status for more details on what each status means.

The type of health check is configured by setting

clusterStatsConfiguration:
  monitorType: ""

to one of the following values.

INFO_API (default)#

By default Starburst Gateway uses the v1/info REST endpoint. A successful check is defined as a 200 response with starting: false. Connection timeout parameters can be defined through the monitor node, for example

monitor:
  connectTimeoutSeconds: 5
  requestTimeoutSeconds: 10
  idleTimeoutSeconds: 1
  retries: 1

All timeout parameters are optional.

METRICS#

This pulls statistics from Trino’s OpenMetrics endpoint. It retrieves the number of running and queued queries for use with the QueryCountBasedRouter (either METRICS or JDBC must be enabled if QueryCountBasedRouter is used).

By default, it uses the trino_execution_name_QueryManager_RunningQueries and trino_execution_name_QueryManager_QueuedQueries to track the number of running and queued queries respectively, however these metrics can be configured as follows:

monitor:
    runningQueriesMetricName: io_starburst_galaxy_name_GalaxyMetrics_RunningQueries
    queuedQueriesMetricName: io_starburst_galaxy_name_GalaxyMetrics_QueuedQueries

Similarly, by default the monitor pulls the metrics using the /metrics endpoint, but it can be updated to use another one:

monitor:
    metricsEndpoint: /v1/metrics

This monitor allows customizing health definitions by comparing metrics to fixed values. This is configured through two maps: metricMinimumValues and metricMaximumValues. The keys of these maps are the metric names, and the values are the minimum or maximum values (inclusive) that are considered healthy. By default, the only metric populated is:

monitor:
    metricMinimumValues:
        trino_metadata_name_DiscoveryNodeManager_ActiveNodeCount: 1

This requires the cluster to have at least one active worker node in order to be considered healthy. The map is overwritten if configured explicitly. For example, to increase the minimum worker count to 10 and disqualify clusters that have been experiencing frequent major Garbage Collections, set

monitor:
    metricMinimumValues:
        trino_metadata_name_DiscoveryNodeManager_ActiveNodeCount: 10
    metricMaximumValues:
        io_airlift_stats_name_GcMonitor_MajorGc_FiveMinutes_count: 2

JDBC#

This uses a JDBC connection to query system.runtime tables for cluster information. It is required for the query count based routing strategy. This is recommended over UI_API since it does not restrict the Web UI authentication method of backend clusters. Configure a username and password by adding backendState to your configuration. The username and password must be valid across all backends.

backendState:
  username: "user"
  password: "password"

Starburst Gateway uses explicitPrepare=false by default. This property was introduced in Trino 431, and uses a single query for prepared statements, instead of a PREPARE/EXECUTE pair. If you are using the JDBC health check option with older versions of Trino, set

monitor:
   explicitPrepare: true

The query timeout can be set through

monitor:
    queryTimeout: 10

Other timeout parameters are not applicable to the JDBC connection.

JMX#

The monitor type JMX can be used as an alternative to collect cluster information, which is required for the QueryCountBasedRouterProvider. This uses the v1/jmx/mbean endpoint on clusters.

To enable this, you must activate JMX monitoring on all Trino clusters:

jmx.rmiregistry.port=<port>
jmx.rmiserver.port=<port>

Allow JMX endpoint access by adding rules to your file-based access control configuration. Example for user:

{
  "catalogs": [
    {
      "user": "user",
      "catalog": "system",
      "allow": "read-only"
    }
  ],
  "system_information": [
    {
      "user": "user",
      "allow": ["read"]
    }
  ]
}

Ensure that a username and password are configured by adding the backendState section to your configuration. The credentials must be consistent across all backend clusters and have read rights on the system_information.

backendState:
  username: "user"
  password: "password"

The JMX monitor will use these credentials to authenticate against the JMX endpoint of each cluster and collect metrics like running queries, queued queries, and worker nodes information.

UI_API#

This pulls cluster information from the ui/api/stats REST endpoint. This is supported for legacy reasons and may be deprecated in the future. It is only supported for backend clusters with web-ui.authentication.type=FORM. Set a username and password using backendState as with the JDBC option.

NOOP#

This option disables health checks.