EKS cluster creation#

This topic covers creating and networking the cluster where Starburst Enterprise platform (SEP) is installed to ensure that all SEP resources are co-located and follow best practices.

Note

This topic assumes that you have read and have a working knowledge of the following topics:

After your cluster is created as described in this topic, you are ready to begin installing SEP.

Prerequisites#

The following tools, policies, and certificates are required to create an SEP cluster in EKS:

  • helm

  • kubectl

  • eksctl version 0.54.0 or later

  • IAM Polices for Glue, S3, as desired

  • CA-signed certificate for HTTPS/TLS (e.g., for starburst.example.com) if using AD/LDAP authentication

The following example IAM add-on policies must be present for each node group:

managedNodeGroups:
  - name: SEP-MANAGED-NODE-GROUP-EXAMPLE
    iam:
      withAddonPolicies:
        externalDNS: true
        albIngress: true

Warning

SEP has specific requirements for sizing, placement, and sharing of resources. You must ensure that your EKS cluster meets all requirements described in our cluster requirements section. In addition, you must use only EC2-based instance types.

Create your sep_eks_cluster.yaml file#

Your YAML file should start with the following two lines:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

Next, add the metadata: section to describe your cluster. The following example shows the minimum required fields, as well as suggested tags for a staging cluster running in us-east-2:

metadata:
  name: my-sep-cluster
  region: us-east-2
  version: "VERSION_NUMBER"
  tags:
    cloud: aws
    environment: staging
    info: "EKS cluster for Starburst Enterprise staging environment"
    user: USER_NAME

Replace the following:

  • VERSION_NUMBER with the correct eksctl version number.

  • USER_NAME with the name of your user tag.

Specify your networking#

Add the two required AZs, and the existing subnets associated with them:

vpc:
  subnets:
     private:
       us-east-2a:
         id: subnet-0Subnet1ID8String2
       us-east-2b:
         id: subnet-0Subnet0ID2String3

For purposes of this example, we set up SEP in the us-east-2a AZ. While AWS EKS does require two AZs to be defined, best practice is to use only one of the two to set up your managed node group for SEP.

Note

AWS requires EKS clusters to have a minimum of two availability zones (AZs), but SEP best practices requires that the coordinator and workers reside in a single AZ in a single subnet.

Continue configuring your DNS and ingress access as determined in the EKS networking guide.

Create your EKS managedNodeGroups:#

Starburst recommends using managedNodeGroups: to create the pools of instances available to SEP and its associated services. managedNodeGroups: in EKS have the additional benefit of automating SIGTERM delivery to SEP workers and the coordinator when a Spot instance is removed to enable graceful shutdown. With nodeGroups: additional development must be done, outside of SEP, to allow for graceful shutdown.

Caution

The interruption mechanism of Spot instances can cause query failures. Review the Spot instances guidance before deploying.

In this example, a managed node group called SEP-SERVICES-MANAGED-NODE-GROUP-EXAMPLE is created alongside the coordinator and worker managed node groups. This managed node group runs the HMS and Ranger services, if they are required in your environment.

The coordinator is required for a functioning SEP cluster, and SEP architecture only allows for one coordinator at a time. You must ensure that resource availability is maximized. This can be done in two ways:

  • Create a separate node group for the coordinator with two or more instances on cold standby. In the case that the instance the coordinator is running on goes down, having two or more instances allows the coordinator pod to be restarted on the unaffected instance.

  • Define a priority class for the coordinator. This allows the cluster to eject a worker pod and start on that machine without waiting for a new node to start if the coordinator’s instance of the same size is lost. The worker’s termination grace period, which is five minutes by default, is still respected.

Warning

AWS best practices typically recommend creating a cluster with three node groups, each in a different availability zone (AZ) to tolerate the loss of an AZ. This approach comes with significant performance and cost penalties for SEP, as internode traffic crosses availability zones. We strongly recommend all node groups remain within the same AZ, with failure tolerance handled by using multiple clusters.

EKS requires the first three IAM policy ARNs shown in the following example for both node groups. The fourth policy ARN for Glue illustrates using an additional policy to allow access to EKS, S3, and Glue in the account used for the EKS cluster without supplying credentials. It is not required.

Note

You must repeat all tags specified in the metadata tags in both entries in managedNodeGroups:.

managedNodeGroups:
  - name: coordinator
    tags:
      cloud: aws
      environment: staging
      info: "EKS cluster for Starburst Enterprise staging environment"
      user: USER_NAME
    # Ensure all mangedNode groups have the same availabiityZone set.
    availabilityZones: [us-east-2a]
    labels:
      allow: coordinator
    # It is recommended to not use spot instances for the coordinator.
    spot: false
    instanceType: m5.8xlarge
    desiredCapacity: 1
    minSize: 1
    maxSize: 1
    privateNetworking: true
    # The following ssh key section is optional and can be removed if not desired.
    # It allows you to ssh directly to the EKS workers.
    ssh:
      allow: true
      publicKeyName: PUBLIC_KEY_NAME
    iam:
      attachPolicyARNs:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        # The following policy is optional and can be removed if not desired.
        - arn:aws:iam:: POLICY_ID
  - name: workers
    tags:
      cloud: aws
      environment: staging
      info: "EKS cluster for Starburst Enterprise staging environment"
      user: USER_NAME
    # Ensure all mangedNode groups have the same availabiityZone set.
    availabilityZones: [us-east-2a]
    labels:
      allow: workers
    spot: false
    instanceType: m5.8xlarge
    # If spot instances are desired then comment out or remove the two lines above.
    # Then remove the comments from below.
    #spot: true
    #instanceTypes: ["m5.8xlarge", "m5a.8xlarge", "m5ad.8xlarge"]
    #
    # The following can be adjusted to the desired number of workers for your environment.
    desiredCapacity: 2
    minSize: 2
    maxSize: 4
    privateNetworking: true
    # The following ssh key section is optional and can be removed if not desired.
    # It allows you to ssh directly to the EKS workers.
    ssh:
      allow: true
      publicKeyName: PUBLIC_KEY_NAME
    iam:
      attachPolicyARNs:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        # The following policy is optional and can be removed if not desired.
        - arn:aws:iam:: POLICY_ID
  - name: SEP-SERVICES-MANAGED-NODE-GROUP-EXAMPLE
    tags:
      cloud: aws
      environment: staging
      info: "EKS cluster for Starburst Enterprise staging environment"
      user: USER_NAME
    # Ensure all mangedNode groups have the same availabiityZone set.
    availabilityZones: [us-east-2a]
    labels:
      allow: support
    spot: false
    instanceType: m5.xlarge
    # If spot instances are desired then comment out or remove the two lines above.
    # Then remove the comments from below.
    #spot: true
    #instanceTypes: ["m5.xlarge", "m5a.xlarge", "m5ad.xlarge"]
    desiredCapacity: 1
    minSize: 1
    maxSize: 1
    privateNetworking: true
    # The following ssh key section is optional and can be removed if not desired.
    # It allows you to ssh directly to the EKS workers.
    ssh:
      allow: true
      publicKeyName: PUBLIC_KEY_NAME
    iam:
      attachPolicyARNs:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        # The following policy is optional and can be removed if not desired.
        - arn:aws:iam:: POLICY_ID

Replace the following:

  • USER_NAME with the name of your user tag.

  • PUBLIC_KEY_NAME with the name of your public ssh key.

  • POLICY_ID with the IAM policy id.

Spot instances#

AWS EC2 Spot is a cost optimization and scaling tool that allows you to deploy resources on Spot Instances, with the caveat that they can be recalled by the EC2 platform at any time. Because of how SEP drives performance by dividing query processing for any single query, sudden removal of an expected resource can cause failures. Further, when using a single AZ, Spot instance rebalancing is diminished, as pools in only the single AZ are available to it.

To use Spot instances for the SEP workers, create a third node group to contain them, separating the non-Spot coordinator node group from the Spot worker node group. The remaining node group contains support services as before.

If you decide to use Spot instances for your SEP deployment, review the AWS best practices for EC2 Spot. SEP can be deployed in autoscaling groups of Spot instances with any of the following Allocation Strategies:

  • General Purpose

  • Memory Optimized

  • Compute Optimized

Best practice for SEP with Spot is to utilize m5.xlarge or smaller instances to reduce the likelihood of query failure from a worker going down. To enable Spot instances for a node group, update your configuration YAML with the desired instance sizes and add spot: true to the managedNodeGroups entry:

managedNodeGroups:

  - name: SEP-MANAGED-NODE-GROUP-EXAMPLE
    ...
    instanceTypes: ["m5.xlarge", "m5a.xlarge", "m5ad.xlarge"]
    spot: true
    ...

The appropriate strategy for your deployment may differ based on workload requirements. Consult Starburst Support for guidance on which strategy is best for you.

Save your file and create your cluster#

When you are finished adding the required content to your sep_eks_cluster.yaml file, save it and use it to create your cluster with eksctl:

$ eksctl create cluster -f sep_eks_cluster.yaml

When the command completes successfully, the following message appears in your terminal:

2021-07-14 14:28:56 [] EKS cluster "my-sep-cluster" in "us-east-2" region is ready

Note

If you get an error about not being able to deploy Kubernetes 1.20, ensure you are running eksctl version 0.54.0 or later.

Next steps#

  • Consider enabling autoscaling. This is not a requirement, and you can review that topic at any time, even after installation.

  • Prepare to install.