Starburst Enterprise with Kubernetes requirements#

Kubernetes (k8s) and the related tools are very vibrant open source projects. The dynamic nature of these projects and the commercial extensions and modifications result in a lot of change as well as a myriad of features and options.

Usage of Starburst Enterprise platform (SEP) on Kubernetes can not support all the variations and the following sections detail the specific requirements for SEP deployments on k8s.

K8s cluster requirements#

The following k8s versions are supported:

  • 1.20

  • 1.19

  • 1.18

As a result the following services can be used:

  • EKS

  • GKE

  • AKS

  • OpenShift 4.x or higher

In all cases the clusters, and the described usage and deployment, only support standard k8s tools.

The nodes in the k8s cluster need to fulfill the following requirements:

  • 64 to 256 GB RAM

  • 16 to 64 cores

  • all nodes are identical

  • each node is dedicated to one SEP worker or coordinator only

  • nodes are not shared with other applications running in the cluster

  • x86_64 or ARM (Graviton) processor architecture

Warning

Resource sharing on a node or pod is not supported. Each node must only host one coordinator or worker pod. Nodes and pods cannot be shared with other applications. SEP performs best with exclusive access to the underlying memory and CPU resources since it uses significant resources on each node. It performs best when all nodes are equal for each worker, allowing the coordinator to execute query planning and processing optimizations against a predictable resource pool and processing power.

The recommended approach to achieve this is to use a dedicated cluster or namespace for all SEP nodes. You can read more about this in our cluster design guidelines.

If for some reason you absolutely must place SEP in a cluster shared with other applications, you must ensure that exclusive node access for SEP is guaranteed. In this case, you can use nodegroups, taints and tolerations, or pod affinity and anti-affinity to achieve this. This approach is not recommended, since it is more complex to implement and can easily negatively impact performance, but can be used by experienced k8s administrators.

We recommend that you take advantage of our SEP performance tuning training video for in-depth information on topics such as cluster and machine sizing, workload tuning and resource management to help you make informed choices while planning your implementation.

Scaling requirements#

If you plan to use automatic scaling of your SEP deployment, additional required components need to be installed:

Reference the documentation of the above tools for installation instructions.

The automatic scaling adds and removes worker nodes based on demand. This differs from the commonly used horizontal scaling where new pods are started on existing nodes, and is a result of the fact that workers require a full dedicated node. You need to ensure that your k8s cluster supports this addition of nodes and has access to the required resources.

Access requirements#

Access to SEP from outside the cluster using the Trino CLI, or any other application, requires the coordinator to be available via HTTPS and a DNS hostname.

This can be achieved with an external load balancer and DNS that terminates HTTPS and reroutes to HTTP requests inside the cluster

Alternatively you can configure a DNS service for your k8s cluster and configure ingress appropriately.

Installation tool requirements#

  • kubectl, version identical to the k8s cluster version

  • helm, version 3.2.4 or newer

In addition we strongly recommend Octant to simplify cluster workload visualization and management. The Octant Helm plugin can simplify usage further.

Note

Check out our Helm troubleshooting tips.

Helm chart repository#

The Helm charts and docker images required for deployment and operation are available in the Starburst Harbor instance at https://harbor.starburstdata.net.

Customer-specific user accounts to access Harbor are available from Starburst.

Installation and usage requires you to add the Helm repository on Harbor:

helm repo add \
  --username yourusername \
  --password yourpassword \
  starburstdata \
  https://harbor.starburstdata.net/chartrepo/starburstdata

Confirm success by listing the repository with the following command:

$ helm repo list
NAME           URL
starburstdata  https://harbor.starburstdata.net/chartrepo/starburstdata

If you search the repository, the available charts are listed:

$ helm search repo
NAME                                          CHART VERSION   APP VERSION     DESCRIPTION
starburstdata/starburst-hive                  364.3.0                         Helm chart for Apache Hive
starburstdata/starburst-enterprise      364.3.0       1.0             A Helm chart for Starburst Enterprise
starburstdata/starburst-ranger                364.3.0                         Apache Ranger

After new releases from Starburst, you have to update the repository:

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "starburstdata" chart repository
Update Complete. ⎈ Happy Helming!⎈

Docker registry#

The Helm charts reference the Docker registry on Harbor to download the relevant docker images.

Create a values.yml for each Helm chart you want to install in a cluster. This file contains the configuration for the specific chart in the specific cluster.

As a minimum you need the YAML file for registry credentials and a separate YAML file for each chart and each cluster.

More details, including using your own Docker registry, are available in the Docker image and registry section for the Helm chart for SEP.

License#

You need to ensure you get a license file from Starburst and configure it, if you intend to use features of SEP.

RBAC-enabled clusters#

Kubernetes by default provides the ClusterRole edit. This role includes all necessary permissions to deploy and work with Helm charts that consist of common resources. SEP uses one additional custom type through use of the externalSecrets: node. However, this custom resource definition must be deployed to the cluster with its own chart. It is only required in specific use cases, as in this example.

Helm troubleshooting tips#

Here are some things to keep in mind as you implement SEP with Kubernetes and Helm in your organization.

  • Helm is space-sensitive and tabs are not valid. Our default values.yaml file uses 2-space indents. Ensure that your code editor preserves the correct indentation as it copies and pastes text.

  • If your YAML files fail to parse, Helm provides several tools to debug this issue.

  • Sometimes problems can arise from improperly formatted YAML files. SEP uses Helm to build its *.properties files from YAML both with YAML key-value pairs and with multi-line strings. We recommend that you review the YAML techniques used in Helm to ensure that you are comfortable with using the Helm multi-line strings feature, and to understand the importance of consistent indentation.

  • Units can also be a source of confusion. We strongly suggest that you pay close attention to any units regarding memory and storage. In general, units provided for multiline strings that feed into SEP configuration files are in traditional metric bytes, such as megabytes (MB) and gigabytes (GB) as used by SEP. However, machine sizing and other values are in binary multiples such as mebibytes (Mi) and gibibytes (Gi), since these are used directly by Helm and Kubernetes.