Installation#
Install Starburst Gateway with Helm.
Use the following instructions to configure and deploy Starburst Gateway.
Configuration#
The following sections describe the things you need for your configuration.
Each component of the Starburst Gateway has a corresponding node in the configuration YAML file.
Backend database#
Starburst Gateway requires a MySQL, PostgreSQL, or Oracle database. Database initialization is performed automatically when the Starburst Gateway process starts.
Starburst clusters#
Starburst Galaxy and the current and previous Starburst Enterprise LTS versions are compatible with Starburst Gateway.
Note
Starburst Gateway is not compatible with Trino or Amazon Athena.
SEP Configuration#
From a users perspective Starburst Gateway acts as a transparent proxy for one or more clusters. The following configuration tips should be taken into account for all clusters behind the Starburst Gateway.
If all client and server communication is routed through Starburst Gateway, then process forwarded HTTP headers must be enabled:
http-server.process-forwarded=true
Without this setting, first requests go from the user to Starburst Gateway and then to SEP correctly. However, the URL for subsequent next URIs for more results in a query provided by SEP is then using the local URL of the cluster, and not the URL of the Starburst Gateway. This circumvents the Starburst Gateway for all these requests. In scenarios, where the local URL of the SEP cluster is private to the cluster on the network level, these following calls do not work at all for users.
This setting is also required for SEP to authenticate in the case TLS is
terminated at the Starburst Gateway. Normally it refuses to authenticate plain HTTP
requests, but if http-server.process-forwarded=true
it authenticates over
HTTP if the request includes X-Forwarded-Proto: HTTPS
.
Starburst Galaxy requires setting
routing:
addXForwardedHeaders: false
to prevent Starburst Gateway from sending X-Forwarded-*
headers. This setting
is required to be true
when proxying clusters that are behind firewalls or
otherwise not directly routable from the SQL client.
Secrets in configuration file#
Environment variables can be used as values in the configuration file. You can manually set an environment variable on the command line.
export DB_PASSWORD=my-super-secret-pwd
To use this variable in the configuration file, you reference it with the
syntax ${ENV:VARIABLE}
. For example:
dataStore:
jdbcUrl: jdbc:postgresql://localhost:5432/gateway
user: postgres
password: ${ENV:DB_PASSWORD}
Configure routing rules#
Find more information in the routing rules documentation.
Configure logging #
To configure the logging level for various classes, specify the path to the
log.properties
file by setting log.levels-file
in serverConfig
.
For additional configurations, use the log.*
properties from the
logging properties documentation and specify
the properties in serverConfig
.
Configure additional v1/statement-like paths#
The SEP client protocol specifies that queries are initiated by a POST to v1/statement
.
The Starburst Gateway incorporates this into its routing logic by extracting and recording the
query id from responses to such requests. If you use an experimental or commercial build of
SEP that supports additional endpoints, you can cause Starburst Gateway to treat them
equivalently to /v1/statement
by adding them under the additionalStatementPaths
configuration node. They must be absolute, and no path can be a prefix to any other path.
The standard /v1/statement
path is always included and does not need to be configured.
For example:
additionalStatementPaths:
- '/ui/api/insights/ide/statement'
Configure behind a load balancer#
A possible deployment of Starburst Gateway is to run multiple instances of Starburst
Gateway behind another generic load balancer, such as a load balancer from
your cloud hosting provider. In this deployment you must configure the
serverConfig
to include enabling process forwarded HTTP headers:
serverConfig:
http-server.process-forwarded: true
Configure larger proxy response size#
Starburst Gateway reads the response from SEP in bytes (up to 32MB by default). It can be configured by setting:
proxyResponseConfiguration:
responseSize: 50MB
Deploy Starburst Gateway#
Use the following steps to deploy Starburst Gateway on Kubernetes with Helm.
The Starburst Gateway Helm chart includes:
A
config
node for general configurationStandard Helm options such as
replicaCount
,resources
, andingress
Access the Helm Chart#
The Starburst Gateway Helm chart is available in the Starburst Helm chart project.
Use the following commands to access the chart:
helm registry login harbor.starburstdata.net/starburstdata
# Enter your credentials
helm pull oci://harbor.starburstdata.net/starburstdata/charts/starburst-gateway --version 472.0.0
Create a secret#
Use the following command to create a secret for Harbor registry authentication:
kubectl create secret docker-registry harbor-auth \
--docker-server=harbor.starburstdata.net \
--docker-username=<your-username> \
--docker-password=<your-password> \
--docker-email=<your-email>
Deploy a PostgreSQL database#
Use the following command to create a PostgreSQL pod:
kubectl apply -f val-posgres.yaml
Install Starburst Gateway#
Use the following command to install Starburst Gateway:
helm upgrade --install starburst-gateway starburst-gateway-472.0.0.tgz -f gateway-config.yaml
Access the Starburst Gateway UI#
User the following command to access the Starburst Gateway UI:
kubectl port-forward service/starburst-gateway 8080:8080
Then, access the UI at http://localhost:8080.
Additional options#
To implement static routing rules, create a ConfigMap from your routing rules yaml definition:
kubectl create cm routing-rules --from-file your-routing-rules.yaml
Then mount it to your container:
volumes:
- name: routing-rules
configMap:
name: routing-rules
items:
name: your-routing-rules.yaml
path: your-routing-rules.yaml
volumeMounts:
- name: routing-rules
mountPath: "/etc/routing-rules/your-routing-rules.yaml"
subPath: your-routing-rules.yaml
Ensure that the mountPath
matches the rulesConfigPath
specified in your
configuration. Note that the subPath
is not strictly necessary, and if it
is not specified the file is mounted at mountPath/<configMap key>
.
Standard Helm options such as replicaCount
, image
, imagePullSecrets
,
service
, ingress
and resources
are supported. These are defined in
helm/values.yaml
.
Health checks#
Starburst Gateway periodically performs health checks and maintains an in-memory
health status for each backend. If a backend fails a health check, it is marked
as UNHEALTHY
, and Starburst Gateway stops routing requests to it.
It is important to distinguish health status from the active/inactive state of a backend. The active/inactive state indicates whether a backend is manually turned on or off, whereas health status is programmatically determined by the health check process. Health checks are only performed on backends that are marked as active.
Starburst recommends using either INFO_API
or METRICS
for your health check.
Other options may be deprecated in the future.
See health status for more details on what each status means.
The type of health check is configured by setting
clusterStatsConfiguration:
monitorType: ""
to one of the following values.
INFO_API (default)#
By default Starburst Gateway uses the v1/info
REST endpoint. A successful check is
defined as a 200 response with starting: false
. Connection timeout parameters
can be defined through the monitor
node, for example
monitor:
connectTimeoutSeconds: 5
requestTimeoutSeconds: 10
idleTimeoutSeconds: 1
retries: 1
All timeout parameters are optional.
METRICS#
This pulls statistics from Trino’s OpenMetrics endpoint.
It retrieves the number of running and queued queries for use with
the QueryCountBasedRouter
(either METRICS
or JDBC
must be enabled if
QueryCountBasedRouter
is used).
By default, it uses the trino_execution_name_QueryManager_RunningQueries
and
trino_execution_name_QueryManager_QueuedQueries
to track the number of running
and queued queries respectively, however these metrics can be configured as follows:
monitor:
runningQueriesMetricName: io_starburst_galaxy_name_GalaxyMetrics_RunningQueries
queuedQueriesMetricName: io_starburst_galaxy_name_GalaxyMetrics_QueuedQueries
Similarly, by default the monitor pulls the metrics using the /metrics
endpoint, but it
can be updated to use another one:
monitor:
metricsEndpoint: /v1/metrics
This monitor allows customizing health definitions by comparing metrics to fixed
values. This is configured through two maps: metricMinimumValues
and
metricMaximumValues
. The keys of these maps are the metric names, and the values
are the minimum or maximum values (inclusive) that are considered healthy. By default,
the only metric populated is:
monitor:
metricMinimumValues:
trino_metadata_name_DiscoveryNodeManager_ActiveNodeCount: 1
This requires the cluster to have at least one active worker node in order to be considered healthy. The map is overwritten if configured explicitly. For example, to increase the minimum worker count to 10 and disqualify clusters that have been experiencing frequent major Garbage Collections, set
monitor:
metricMinimumValues:
trino_metadata_name_DiscoveryNodeManager_ActiveNodeCount: 10
metricMaximumValues:
io_airlift_stats_name_GcMonitor_MajorGc_FiveMinutes_count: 2
JDBC#
This uses a JDBC connection to query system.runtime
tables for cluster
information. It is required for the query count based routing strategy. This is
recommended over UI_API
since it does not restrict the Web UI authentication
method of backend clusters. Configure a username and password by adding
backendState
to your configuration. The username and password must be valid
across all backends.
backendState:
username: "user"
password: "password"
Starburst Gateway uses explicitPrepare=false
by default. This property was introduced
in Trino 431, and uses a single query for prepared statements, instead of a
PREPARE/EXECUTE
pair. If you are using the JDBC health check option with older
versions of Trino, set
monitor:
explicitPrepare: true
The query timeout can be set through
monitor:
queryTimeout: 10
Other timeout parameters are not applicable to the JDBC connection.
JMX#
The monitor type JMX
can be used as an alternative to collect cluster information,
which is required for the QueryCountBasedRouterProvider
. This uses the v1/jmx/mbean
endpoint on clusters.
To enable this, you must activate JMX monitoring on all Trino clusters:
jmx.rmiregistry.port=<port>
jmx.rmiserver.port=<port>
Allow JMX endpoint access by adding rules to your file-based access control
configuration. Example for user
:
{
"catalogs": [
{
"user": "user",
"catalog": "system",
"allow": "read-only"
}
],
"system_information": [
{
"user": "user",
"allow": ["read"]
}
]
}
Ensure that a username and password are configured by adding the backendState
section to your configuration. The credentials must be consistent across all
backend clusters and have read
rights on the system_information
.
backendState:
username: "user"
password: "password"
The JMX monitor will use these credentials to authenticate against the JMX endpoint of each cluster and collect metrics like running queries, queued queries, and worker nodes information.
UI_API#
This pulls cluster information from the ui/api/stats
REST endpoint. This is
supported for legacy reasons and may be deprecated in the future. It is only
supported for backend clusters with web-ui.authentication.type=FORM
. Set
a username and password using backendState
as with the JDBC
option.
NOOP#
This option disables health checks.