Starburst Enterprise with Kubernetes overview#

The Kubernetes support for Starburst Enterprise platform (SEP) allows you to run your SEP clusters and additional components such as your Hive Metastore Service or Apache Ranger. The features of k8s allow you to efficiently create, operate and scale your clusters and adapt them to your workload requirements.

Note

Familiarity with Kubernetes and Helm is strongly suggested to understand all aspects in our documentation.

Kubernetes platforms#

The following Kubernetes cluster services are tested regularly and supported:

  • Amazon Elastic Kubernetes Service (EKS)

  • Google Kubernetes Engine (GKE)

  • Microsoft Azure Kubernetes Service (AKS)

  • Red Hat OpenShift

This chapter aims to cover all the above, focusing on k8s usage that applies to all services.

Other Kubernetes distributions and installations can potentially work if the requirements are fulfilled, but they are not tested and not supported.

Available Helm charts#

The following Helm charts are available as part of the SEP k8s offering:

Whether you are new to Kubernetes or not, we strongly suggest that you first read about SEP Kubernetes cluster design and our best practices guide for Helm chart customization to learn how SEP uses them to build the configuration properties files it relies on.

Kubernetes cluster design for Starburst Enterprise#

SEP by its nature is built for performance. How it operates differs from other typical applications running in Kubernetes.

Typically, an enterprise application comprises many stateless microservices, each of which can be run on a small instance. SEP’s exceptional performance comes from its powerful query optimization engine, which expects all nodes to be identically sized for query planning. It also depends on each node to have large amounts of memory, to allow parallel processing within a node as well as processing of large amounts of data per node.

Once work is divided up among worker nodes, it is not redirected if a node dies, as this would obviate any performance gains. SEP coordinator and worker nodes are therefore stateful, and by design rely on fewer, larger nodes.

Ideally SEP runs within a namespace dedicated to it and it alone. Separate pods can be defined for worker nodes and coordinator nodes in that namespace, and taints and tolerations can be defined for node selection in SEP.

You must review the SEP Kubernetes requirements before you begin installing the Helm charts to ensure that you have the correct credentials in place and understand sizing requirements.

Configuring SEP with Helm charts#

SEP uses a number of configuration files that determine how it behaves:

  • etc/catalog/<catalog name>.properties

  • etc/config.properties

  • etc/jvm.properties

  • etc/log.properties

  • etc/node.properties

  • etc/access-control.properties

With our Kubernetes deployment, these files are built using Helm charts.

The necessary catalog properties files depend on what data sources you connect to with SEP. You can configure as many as you need. In our deployments, these configuration files are built using the YAML configuration files.

Customization best practices#

Every helm-based deployment includes a values.yaml file, and SEP is no exception. As with any Helm-based deployment, it contains the default values, any and all of which can be overridden.

Along with basic instance configuration for the various cloud platforms, values.yaml also includes the key-value pairs necessary to build the required *.properties files that configure SEP.

Read our SEP configuration guide for Kubernetes to learn how the configuration options are structured. The guide provides concrete recommendations for creating specific override files, with examples.

Default YAML files#

Default values are provided in for a minimum configuration only, not including security or any catalog connector properties, as these vary by customer needs. The configuration in the Helm chart also contains deployment information such as registry credentials and instance size.

Each new release of SEP includes new charts, and the default values may change. For this reason, we highly recommend that you follow best practices and leave the values.yaml file in the chart untouched, overriding only very specific values as needed in one or more separate YAML files.

Warning

Do not change the values.yaml, or copy it in its entirety and make changes to it and specify that as the override file, as new releases may change default values or render your values non-performant. Use separate files with changed values only, as described above.

Using version control#

We very strongly recommend that you manage your customizations in a version control system. Each cluster and deployment has to be managed in separate files.

Creating and using YAML files#

The following is a snippet of default values from the SEP values.yaml file embedded in the Helm chart:

coordinator:
  resources:
    requests:
      memory: "60Gi"
      cpu: 16
    limits:
      memory: "60Gi"
      cpu: 16

The default 60GB of required memory is potentially larger than any of the available pods in your cluster. As a result the default prevents your deployment success, since no suitable pod is available.

To create a customization that overrides the default size for a test cluster, copy and paste only that section into a new file named sep-test-setup.yaml, and make any changes. You must also include the relevant structure above that section. The memory settings for workers have the same default values and need to be overridden as well:

coordinator:
  resources:
    requests:
      memory: "10Gi"
      cpu: 2
    limits:
      memory: "10Gi"
      cpu: 2
worker:
  resources:
    requests:
      memory: "10Gi"
      cpu: 2
    limits:
      memory: "10Gi"
      cpu: 2

Store the new file in a path accessible from the helm upgrade --install command.

When you are ready to install, specify the new file using the --values argument as in the following example:

helm upgrade my-sep-test-cluster starburstdata/starburst-enterprise \
  --install \
  --version 350.0.0 \
  --values ./registry-access.yaml
  --values ./sep-test-setup.yaml

You can chain as many override files as you need. If a value appears in multiple files, the value in the right-most file, last specified file takes precedence. Typically it is useful to limit the number of files as well as the size of the individual files. For example, it can be useful to create a separate file that contains all catalog definitions.