Starburst Enterprise with Kubernetes requirements#
Kubernetes (k8s) and the related tools are very vibrant open source projects. The dynamic nature of these projects and the commercial extensions and modifications result in a lot of change as well as a myriad of features and options.
Usage of Starburst Enterprise platform (SEP) on Kubernetes can not support all the variations and the following sections detail the specific requirements for SEP deployments on k8s.
K8s cluster requirements#
The following k8s versions are supported:
As a result the following services can be used:
OpenShift 4.x or higher
In all cases the clusters, and the described usage and deployment, only support standard k8s tools.
The nodes in the k8s cluster need to fulfill the following requirements:
64 to 256 GB RAM
16 to 64 cores
all nodes are identical
each node is dedicated to one SEP worker or coordinator only
nodes are not shared with other applications running in the cluster
x86_64 or ARM (Graviton) processor architecture
Resource sharing on a node or pod is not supported. Each node must only host one coordinator or worker pod. Nodes and pods cannot be shared with other applications. SEP performs best with exclusive access to the underlying memory and CPU resources since it uses significant resources on each node. It performs best when all nodes are equal for each worker, allowing the coordinator to execute query planning and processing optimizations against a predictable resource pool and processing power.
The recommended approach to achieve this is to use a dedicated cluster or namespace for all SEP nodes. You can read more about this in our cluster design guidelines.
If for some reason you absolutely must place SEP in a cluster shared with other applications, you must ensure that exclusive node access for SEP is guaranteed. In this case, you can use nodegroups, taints and tolerations, or pod affinity and anti-affinity to achieve this. This approach is not recommended, since it is more complex to implement and can easily negatively impact performance, but can be used by experienced k8s administrators.
We recommend that you take advantage of our SEP performance tuning training video for in-depth information on topics such as cluster and machine sizing, workload tuning and resource management to help you make informed choices while planning your implementation.
If you plan to use automatic scaling of your SEP deployment, additional required components need to be installed:
Reference the documentation of the above tools for installation instructions.
The automatic scaling adds and removes worker nodes based on demand. This differs from the commonly used horizontal scaling where new pods are started on existing nodes, and is a result of the fact that workers require a full dedicated node. You need to ensure that your k8s cluster supports this addition of nodes and has access to the required resources.
Access to SEP from outside the cluster using the Trino CLI, or any other application, requires the coordinator to be available via HTTPS and a DNS hostname.
This can be achieved with an external load balancer and DNS that terminates HTTPS and reroutes to HTTP requests inside the cluster
Alternatively you can configure a DNS service for your k8s cluster and configure ingress appropriately.
Installation tool requirements#
Check out our Helm troubleshooting tips.
Helm chart repository#
The Helm charts and docker images required for deployment and operation are available in the Starburst Harbor instance at https://harbor.starburstdata.net.
Customer-specific user accounts to access Harbor are available from Starburst.
Installation and usage requires you to add the Helm repository on Harbor:
helm repo add \ --username yourusername \ --password yourpassword \ starburstdata \ https://harbor.starburstdata.net/chartrepo/starburstdata
Confirm success by listing the repository with the following command:
$ helm repo list NAME URL starburstdata https://harbor.starburstdata.net/chartrepo/starburstdata
If you search the repository, the available charts are listed:
$ helm search repo NAME CHART VERSION APP VERSION DESCRIPTION starburstdata/starburst-hive 367.0.0 Helm chart for Apache Hive starburstdata/starburst-enterprise 367.0.0 1.0 A Helm chart for Starburst Enterprise starburstdata/starburst-ranger 367.0.0 Apache Ranger
After new releases from Starburst, you have to update the repository:
$ helm repo update Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "starburstdata" chart repository Update Complete. ⎈ Happy Helming!⎈
The Helm charts reference the Docker registry on Harbor to download the relevant docker images.
values.yml for each Helm chart you want to install in a cluster.
This file contains the configuration for the specific chart in the specific
As a minimum you need the YAML file for registry credentials and a separate YAML file for each chart and each cluster.
More details, including using your own Docker registry, are available in the Docker image and registry section for the Helm chart for SEP.
You need to ensure you get a license file from Starburst and configure it, if you intend to use features of SEP.
Kubernetes by default provides the ClusterRole
edit. This role includes
all necessary permissions to deploy and work with Helm charts that consist of
common resources. SEP uses one additional custom type through use of the
externalSecrets: node. However, this custom resource definition must be
deployed to the cluster with its own chart. It is only required in specific
use cases, as in this example.
Helm troubleshooting tips#
Here are some things to keep in mind as you implement SEP with Kubernetes and Helm in your organization.
Helm is space-sensitive and tabs are not valid. Our default
values.yamlfile uses 2-space indents. Ensure that your code editor preserves the correct indentation as it copies and pastes text.
If your YAML files fail to parse, Helm provides several tools to debug this issue.
Sometimes problems can arise from improperly formatted YAML files. SEP uses Helm to build its
*.propertiesfiles from YAML both with YAML key-value pairs and with multi-line strings. We recommend that you review the YAML techniques used in Helm to ensure that you are comfortable with using the Helm multi-line strings feature, and to understand the importance of consistent indentation.
Units can also be a source of confusion. We strongly suggest that you pay close attention to any units regarding memory and storage. In general, units provided for multiline strings that feed into SEP configuration files are in traditional metric bytes, such as megabytes (MB) and gigabytes (GB) as used by SEP. However, machine sizing and other values are in binary multiples such as mebibytes (Mi) and gibibytes (Gi), since these are used directly by Helm and Kubernetes.