Starburst Enterprise with Kubernetes overview#
The Kubernetes support for Starburst Enterprise platform (SEP) allows you to run your SEP clusters and additional components such as your Hive Metastore Service or Apache Ranger. The features of k8s allow you to efficiently create, operate and scale your clusters and adapt them to your workload requirements.
Familiarity with Kubernetes and Helm is strongly suggested to understand all aspects in our documentation.
The following Kubernetes cluster services are tested regularly and supported:
Amazon Elastic Kubernetes Service (EKS)
Google Kubernetes Engine (GKE)
Microsoft Azure Kubernetes Service (AKS)
Red Hat OpenShift
This chapter aims to cover all the above, focusing on k8s usage that applies to all services.
Other Kubernetes distributions and installations can potentially work if the requirements are fulfilled, but they are not tested and not supported.
Available Helm charts#
The following Helm charts are available as part of the SEP k8s offering:
SEP Helm chart for images, security, coordinator and worker nodes, as well as catalogs, mounted volumes and other properties
Apache Ranger plugin including the Starburst plugin, policy database and everything you need to integrate Ranger with SEP
Apache Hive Metastore Service
Whether you are new to Kubernetes or not, we strongly suggest that you first read about SEP Kubernetes cluster design and our best practices guide for Helm chart customization to learn how SEP uses them to build the configuration properties files it relies on.
Kubernetes cluster design for Starburst Enterprise#
SEP by its nature is built for performance. How it operates differs from other typical applications running in Kubernetes.
Typically, an enterprise application comprises many stateless microservices, each of which can be run on a small instance. SEP’s exceptional performance comes from its powerful query optimization engine, which expects all nodes to be identically sized for query planning. It also depends on each node to have large amounts of memory, to allow parallel processing within a node as well as processing of large amounts of data per node.
Once work is divided up among worker nodes, it is not redirected if a node dies, as this would obviate any performance gains. SEP coordinator and worker nodes are therefore stateful, and by design rely on fewer, larger nodes.
Ideally SEP runs within a namespace dedicated to it and it alone. Separate pods can be defined for worker nodes and coordinator nodes in that namespace, and taints and tolerations can be defined for node selection in SEP.
Configuring SEP with Helm charts#
SEP uses a number of configuration files that determine how it behaves:
With our Kubernetes deployment, these files are built using Helm charts.
The necessary catalog properties files depend on what data sources you connect to with SEP. You can configure as many as you need. In our deployments, these configuration files are built using the YAML configuration files.
Customization best practices#
Every helm-based deployment includes a
values.yaml file, and SEP is no
exception. As with any Helm-based deployment, it contains the default values,
any and all of which can be overridden.
Along with basic instance configuration for the various cloud platforms,
values.yaml also includes the key-value pairs necessary to build the
*.properties files that configure SEP.
Default YAML files#
Default values are provided in for a minimum configuration only, not including security or any catalog connector properties, as these vary by customer needs. The configuration in the Helm chart also contains deployment information such as registry credentials and instance size.
Each new release of SEP includes new charts, and the default values may
change. For this reason, we highly recommend that you follow best practices
and leave the
values.yaml file in the chart untouched, overriding only very
specific values as needed in one or more separate YAML files.
Do not change the
values.yaml, or copy it in its entirety and make changes
to it and specify that as the override file, as new releases may change
default values or render your values non-performant. Use separate
files with changed values only, as described
Using version control#
We very strongly recommend that you manage your customizations in a version control system. Each cluster and deployment has to be managed in separate files.
Creating and using YAML files#
The following is a snippet of default values from the SEP
file embedded in the Helm chart:
coordinator: resources: requests: memory: "60Gi" cpu: 16 limits: memory: "60Gi" cpu: 16
The default 60GB of required memory is potentially larger than any of the available pods in your cluster. As a result the default prevents your deployment success, since no suitable pod is available.
To create a customization that overrides the default size for a test cluster,
copy and paste only that section into a new file named
sep-test-setup.yaml, and make any changes. You must also include the
relevant structure above that section. The memory settings for workers have the
same default values and need to be overridden as well:
coordinator: resources: requests: memory: "10Gi" cpu: 2 limits: memory: "10Gi" cpu: 2 worker: resources: requests: memory: "10Gi" cpu: 2 limits: memory: "10Gi" cpu: 2
Store the new file in a path accessible from the
helm upgrade --install
When you are ready to install, specify the new file using the
argument as in the following example:
helm upgrade my-sep-test-cluster starburstdata/starburst-enterprise \ --install \ --version 350.0.0 \ --values ./registry-access.yaml --values ./sep-test-setup.yaml
You can chain as many override files as you need. If a value appears in multiple files, the value in the right-most file, last specified file takes precedence. Typically it is useful to limit the number of files as well as the size of the individual files. For example, it can be useful to create a separate file that contains all catalog definitions.