AWS EKS networking best practices #

When deploying Starburst Enterprise platform (SEP) into an AWS EKS cluster, it is important to ensure that your cluster resources are located with the following in mind:

  • cluster communications latency
  • data ingress/egress costs
  • IP address availability

Use a single availability zone inside your existing VPC #

Using your existing VPC is a key cost control measure. It ensures that costs associated with data ingress and egress are kept to a minimum. Every new VPC comes with a NAT gateway. And, while NAT gateways are inexpensive, costs for data transferred through that gateway add up very quickly. As a best practice, placing your SEP inside of your existing VPC, co-resident with as many of your data sources as possible not only keeps costs down, it also greatly simplify networking and security.

Equally as important is performance. No matter if you use an existing or new VPC, SEP must run in a single availability zone (AZ) to ensure the best performance possible. To accomplish this, use node groups. The node groups are then tied to a single AZ using affinity or node selection rules.

For SEP, two managed node groups are required. The SEP coordinator and workers are deployed to one group while support services, such as HMS, Ranger and nginx, are deployed to the second node group.

Spot instances and single AZs #

AWS EC2 Spot instances are sometimes desireable to keep costs down. However, they are inherently unreliable, as they can be recalled by the EC2 platform at any time. Because of how SEP drives performance by dividing query processing for any single query, sudden removal of an expected resource can cause failures. Further, when using a single AZ, Spot instance rebalancing is diminished, as pools in only the single AZ are available to it. Using Spot instances may still be desirable in some cases. In this case, to use Spot instances for the SEP workers, create a third node group to contain them, separating the non-Spot coordinator node group from the Spot worker node group. The remaining node group contains support services as before.

IP address requirements #

An important consideration in using an existing VPC is IP address availability. As part of standing up your SEP cluster, you must ensure that sufficient IP addresses are reliably available for use by your SEP instances.

In EKS clusters, AWS creates one subnet per availability zone. Usually, these are configured as /20 Classless Inter-Domain Routings (CIDR) with 4,091 IP addresses available for use (an additional 5 are reserved by the cluster itself). SEP requires that all hosts, both workers and coordinators, are sized identically. Each of these instances has a maximum number of IP addresses that can be assigned to it, and EKS reserves twice that number of addresses for it.

For purposes of this discussion, we assume an SEP deployment involving six m5.xlarge workers and one m5.xlarge coordinator. Each of this instances can have a maximum of 15 IP addresses in use, one per each of the 15 interfaces it comes with. An additional 15 are reserved, for a total of 30 IP addresses needed per instance. Those seven instances together then require 210 available IP addresses:

  ( 7 m5.xlarge instances)
x ( 15 interfaces per instance )
x ( 2 IPs per interface )
= 210 IP addresses needed

In this example, you must ensure that a minimum of 210 IP addresses are reliably available for use by your SEP instances at all times.

Using subnets #

You can use an existing subnet or create a new one. It must be configured with a route out to the Internet either via a NAT gateway or an IGW to allow your EKS cluster to communicate with the AWS EKS management backplane. Cost considerations for these communications are minimal.

Considerations if you must use VPC peering #

If you cannot place SEP within your current VPC because of a scarcity of IP addresses, as an alternative you can create a peering connection with the new EKS cluster’s VPC to avoid the often cost-prohibitive operation of all data passing through the NAT gateway. VPC peering requires additional setup, and comes with potential downsides:

  • Does not scale well.
  • Transitive routing is not available.
  • Peering connections are a resource that must be managed.
  • Firewall rules must be carefully managed.

Additionally, you must ensure that the CIDR you set for the new SEP VPC does not match or overlap with your existing VPC’s CIDR. VPCs with overlapping CIDRs cannot create peering connections.