Starburst Enterprise and Kubernetes #
Kubernetes (k8s) support for Starburst Enterprise platform (SEP) allows you to run your SEP clusters and additional components such as your Hive Metastore Service (HMS) or Apache Ranger. The features of k8s allow you to efficiently create, operate, and scale your clusters, and adapt them to your workload requirements. The Kubernetes support for SEP uses Helm charts.
Kubernetes platform services #
The following Kubernetes platform services are tested regularly and supported:
- Amazon Elastic Kubernetes Service (EKS)
- Google Kubernetes Engine (GKE)
- Microsoft Azure Kubernetes Service (AKS)
- Red Hat OpenShift
This topic focuses on k8s usage and practice that applies to all services.
Other Kubernetes distributions and installations can potentially work if the requirements are fulfilled, but they are not tested and not supported.
Available Helm charts #
The following Helm charts are available as part of the SEP k8s offering:
- Starburst Enterprise includes the coordinator and worker configurations, as well as images, security, catalogs, mounted volumes, and other properties.
- Apache Hive Metastore Service
- Starburst Cache Service to use with Starburst Cached Views
- Apache Ranger including the Starburst plugin, policy database and everything you need to integrate Ranger with SEP
Whether you are new to Kubernetes or not, we strongly suggest that you first read about SEP Kubernetes cluster design and our best practices guide for Helm chart customization in this topic to learn how SEP uses them to build the configuration properties files it relies on.
Kubernetes cluster design for Starburst Enterprise #
SEP by its nature is built for performance. How it operates differs from other typical applications running in Kubernetes.
Typically, an enterprise application comprises many stateless microservices, each of which can be run on a small instance. SEP’s exceptional performance comes from its powerful query optimization engine, which expects all nodes to be identically sized for query planning. It also depends on each node to have large amounts of memory, to allow parallel processing within a node as well as processing of large amounts of data per node.
Once work is divided up among worker nodes, it is not redirected if a node dies, as this would obviate any performance gains. SEP coordinator and worker nodes are therefore stateful, and clusters by design rely on fewer, larger nodes.
Ideally SEP runs within a namespace dedicated to it and it alone. Separate pods can be defined for worker nodes and coordinator nodes in that namespace, and taints and tolerations can be defined for node selection in SEP.
You must review the SEP Kubernetes requirements before you begin installing the Helm charts to ensure that you have the correct credentials in place and understand sizing requirements.
Configuring SEP with Helm charts #
SEP uses a number of configuration files internally that determine how it behaves:
etc/catalog/<catalog name>.properties
etc/config.properties
etc/jvm.properties
etc/log.properties
etc/node.properties
etc/access-control.properties
With our Kubernetes deployment, these files are built using Helm charts, and are
nested in the coordinator.etcFiles
YAML node.
Catalog properties files are also built using Helm charts. They are defined
under the top-level catalogs:
YAML node. The catalogs and properties you
create depend on what data sources you connect
to with SEP. You can configure as many as you need.
Customization best practices #
Every helm-based deployment includes a values.yaml
file, and
SEP is no exception. It contains the default values, any and all
of which can be overridden.
Along with basic instance configuration for the various cloud platforms,
values.yaml
also includes the key-value pairs necessary to build the
required *.properties
files that configure SEP.
Our SEP configuration properties reference for Kubernetes contains the complete list of available configuration options.
A recommended set of customization files is included later in this topic, including recommendations for creating specific override files, with examples.
Default YAML files #
Default values are provided in for a minimum configuration only, not including security or any catalog connector properties, as these vary by customer needs. The configuration in the Helm chart also contains deployment information such as registry credentials and instance size.
Each new release of SEP includes new chart version, and the
default values may change. For this reason, we highly recommend that you
follow best practices and leave the values.yaml
file in the chart untouched,
overriding only very specific values as needed in one or more separate YAML
files.
values.yaml
, or copy it
in its entirety and make changes to it and specify that as the override file, as
new releases may change default values or render your values non-performant. Use
separate files with changed values only,
as described above.
Using version control #
We very strongly recommend that you manage your customizations in a version control system such as a git repository. Each cluster and deployment has to be managed in separate files.
Creating and using YAML files #
The following is a snippet of default values from the SEP
values.yaml
file embedded in the Helm chart:
coordinator:
resources:
memory: "60Gi"
requests:
cpu: 16
The default 60GB of required memory is potentially larger than any of the available pods in your cluster. As a result the default prevents your deployment success, since no suitable pod is available.
To create a customization that overrides the default size for a test cluster,
copy and paste only that section into a new file named
sep-test-setup.yaml
, and make any changes. You must also include the relevant
structure above that section. The memory settings for workers have the same
default values and need to be overridden as well:
coordinator:
resources:
memory: "10Gi"
requests:
cpu: 2
worker:
resources:
memory: "10Gi"
requests:
cpu: 2
Store the new file in a path accessible from the helm upgrade --install
command.
When you are ready to install, specify the new file using the --values
argument as in the following example. Replace 4XX.0.0 with the Helm chart
version of the desired SEP release as documented on the versions
page:
helm upgrade my-sep-test-cluster starburstdata/starburst-enterprise \
--install \
--version 4XX.0.0 \
--values ./registry-access.yaml \
--values ./sep-test-setup.yaml
You can chain as many override files as you need. If a value appears in multiple files, the value in the rightmost, last specified file takes precedence. Typically it is useful to limit the number of files as well as the size of the individual files. For example, it can be useful to create a separate file that contains all catalog definitions.
To view the built-in configuration of the Helm chart for a specific version of SEP, run the following command:
helm template starburstdata/starburst-enterprise --version 4XX.0.0
Use this command with different version values to compare the configuration of different SEP releases as part of your upgrade process.
To generate the specific configuration files for your deployment, use the template command with your additional values files:
helm template starburstdata/starburst-enterprise \
--version 4XX.0.0 \
--values ./registry-access.yaml \
--values ./sep-test-setup.yaml
helm upgrade
command with your
files specified using the --values
argument.Recommended customization file set #
The file set described below describes a series of focused configuration files. If you have more than one cluster, such as a test cluster and a production cluster, name the files accordingly before you begin. Examples are provided in the sections that follow.
File name | Content |
---|---|
registry-access.yaml |
Docker registry access credentials file, typically to access the
Docker registry on the Starburst Harbor instance. Include the
registryCredentials: or imagePullSecrets: top
level node in this file to configure access to the Docker registry. This
file can be used for all SEP, HMS, and Ranger
configuration for all clusters you operate. |
sep-prod-catalogs.yaml |
Catalog configuration for all catalogs configured for SEP
on the prod cluster. It is typically useful to separate catalog
configurations out into a separate file to allow reuse across clusters, as
well as to separate the large amount of configuration of the catalogs from
all the cluster configuration. |
sep-prod-setup.yaml |
Main configuration file for the prod cluster. Include any
configuration for all other top level nodes that configure the
coordinator, workers, and all other aspects of the cluster. |
Create and manage additional configuration files, if you are operating multiple
clusters, while reusing the credentials file. For example, if you run a dev
and stage
cluster use the following additional files:
sep-dev-catalogs.yaml
sep-dev-setup.yaml
sep-stage-catalogs.yaml
sep-stage-setup.yaml
There are several supporting services available for use with SEP, each with their own Helm chart:
- Starburst cache service
- Hive Metastore Service
- Apache Ranger
If you opt to use these services, you can create a configuration file for each of these per cluster as well:
Production prod
cluster:
cache-service-prod.yaml
hms-prod.yaml
ranger-prod.yaml
Development dev
cluster:
cache-service-dev.yaml
hms-dev.yaml
ranger-dev.yaml
Staging stage
cluster:
cache-service-stage.yaml
hms-stage.yaml
ranger-stage.yaml
registry-access.yaml
#
You can get started with a minimal file that only adds your credentials to the Starburst Harbor instance, as shown below:
registryCredentials:
enabled: true
registry: harbor.starburstdata.net/starburstdata
username: <yourusername>
password: <yourpassword>
In the examples throughout our documentation, this file is named
registry-access.yaml
.
sep-prod-catalogs.yaml
#
A catalog YAML file adds all the configurations for defining the catalogs and their connection details to the underlying data sources. The following snippet contains a few completely configured catalogs that are ready to use:
tpch-testdata
exposes the TPC-H benchmark data useful for learning SQL and testing.tmpmemory
uses the Memory connector to provide a small temporary test ground for users.metrics
uses the JMX connector and exposes the internal metrics of SEP for monitoring and troubleshooting.clientdb
uses the Starburst PostgreSQL connector to access theclientdb
database.datalake
ands3
are stubs of catalogs using the Starburst Hive connector with a HMS and a Glue catalog as metastore.
catalogs:
tpch-testdata: |
connector.name=tpch
tmpmemory: |
connector.name=memory
metrics: |
connector.name=jmx
clientdb: |
connector.name=postgresql
connection-url=jdbc:postgresql://postgresql.example.com:5432/clientdb
connection-password=${ENV:PSQL_PASSWORD}
connection-user=${ENV:PSQL_USERNAME}
datalake: |
connector.name=hive
hive.metastore.uri=thrift://hive:9083
s3: |
connector.name=hive
hive.metastore=glue
sep-prod-setup.yaml
#
This example provides a minimal starting point as a best practice. It achieves the following:
environment:
provides the name production for the environment, which becomes visible in the Web UI.sharedSecret:
sets a shared random secret string for communications between the coordinator and all workers. NOTE: This is different than the shared secret set for the license file with thekubectl create secret
command.replicas:
configures the cluster to use four workers.resources:
adjusts the memory and CPU requirements for the workers and the coordinator. In this example, it increases the values for use with more powerful servers than the default.
environment: production
sharedSecret: AN0Qhhw9PsZmEgEXAMPLEkIj3AJZ5/Mnyy5iRANDOMceM+SSV+APSTiSTRING
coordinator:
resources:
memory: "256Gi"
requests:
cpu: 32
worker:
replicas: 4
resources:
memory: "256Gi"
requests:
cpu: 32
Is the information on this page helpful?
Yes
No