Installing Starburst Enterprise in AWS#

Now that you’ve satisfied the all the requirements you are ready to proceed with installing Starburst Enterprise platform (SEP) in AWS.

Overview#

SEP is available as an Amazon Machine Image (AMI), which in turn is used in the CloudFormation template (CFT). You can get the resources directly from Starburst, or from the AWS Marketplace

You can use the EC2 console or AWS CLI to deploy a SEP cluster as a CFT.

Deploy a cluster with the EC2 console#

1. Select template

After you subscribe to the SEP offering on AWS Marketplace, or receive the URL to the CFT, you are able to begin configuring and launching your cluster.

../_images/trino_cft_launch.png

This directs you to the Select Template step for creating a CloudFormation stack. Find the pre-populated field under Amazon S3 template URL. This is the location of Starburst’s SEP CloudFormation template. Click Next.

Optionally you can choose Copy to Service Catalog for when using with the AWS Service Catalog.

../_images/trino_cft_select.png

2. Specify details

Specify the details of your cluster defined by the CFT configuration parameters. This step includes network, EC2 and SEP and other configuration options.

SEP cluster details#

Field

Description

Preliminary details

Provide a name for your stack that allows you to distinguish it from other stacks running in your AWS account. For example, “sepcluster”.

Network configuration

Specify the network configuration parameters. They are typically preconfigured. See the VPC and VPC subnets and Security groups sections for more detail.

EC2 configuration

Specify the EC2 configuration parameters. Note that you need to ensure a volume is mounted and configured on each worker, with the WorkerMountVolume and related parameters if you are using caching for distributed storage.

SEP configuration

Specify the SEP configuration parameters.

Hive connector options

Specify the relevant Hive connector options if you plan to query data in HDFS or S3.

Ranger options and Ranger LDAP user synchronization

These options allow you to configure Apache Ranger for the system level security with Apache Ranger and the related synchronization with an LDAP backend for user and group information. More information is available in the table of parameters.

Advanced AWS S3 configuration

Optionally configure the parameters to access custom S3 or S3-compatible systems.

Monitoring

Indicate if you want to enable integration with CloudWatch metrics.

IAM Instance

Optionally specify the IAM InstanceProfile.

Other

Indicate whether you are deploying and using Apache Superset and other configuration.

3. Options

Enter any additional stack specifications as shown on the Options page. These options include adding tags to resources within your cluster, choosing IAM roles, and specifying monitoring time for rollback triggers, among other advanced specifications. More information can be found in the AWS CloudFormation documentation.

4. Review

Finally, review the details of your SEP cluster. When content, proceed by pressing create to conclude the creation.

Note

Just above the final “Create stack” button there is a blueish box informing you that The following resource(s) require capabilities: [AWS::IAM::Role]. In order to create the Stack you need to mark the checkbox next to I acknowledge that AWS CloudFormation might create IAM resources.

Deploying a cluster with the AWS CLI#

You can deploy an SEP cluster using the AWS CLI instead of the AWS Web Console. Ensure that the AWS CLI is installed and configured before you proceed.

1. Edit script file

Open a terminal window and create a script file. This file will contain a long list of properties for the CFT, so separate each segment with a backslash to make the script easier to read and write. Refer to the example script below for sample formatting.

2. Create stack

Add a first line to the command that uses the AWS CLI to create a CloudFormation stack.

aws cloudformation create-stack \

3. Name stack

Specify the name that is to be associated with the cluster. The name must be unique in the region in which you are creating the cluster.

--stack-name exampleclustername \

4. Specify template

Add the CFT template URL you received. The URL must point to a template that is located in an Amazon S3 bucket.

--template-url https://s3.amazonaws.com/example-context/TrinoCFT.template \

5. Specify parameters

Define a list of parameter structures that specify input parameters for the cluster. Refer to the complete list of parameters in the configuration section Parameter values need to be provided on the command line in special form. Refer to the example below for guidance.

6. Configure rollback options

Configure the rollback ability as desired to disable rollback of the cluster if stack creation fails, with one of the following options:

--disable-rollback \
--no-disable-rollback \
RollbackTriggers=[{Arn=string,Type=string},{Arn=string,Type=string}],MonitoringTimeInMinutes=integer

7. IAM capabilities acknowledgement

Analogous to how you mark the I acknowledge that AWS CloudFormation might create IAM resources. checkbox, when deploying SEP via the AWS console, you need to denote that you accept this capability when deploying with the CLI. This is done by adding a --capabilities parameter to the AWS CLI create-stack command.

--capabilities CAPABILITY_IAM \

8. Review

Finally, review the details of your cluster and your command. When ready, save and run the script file to create your cluster.

Example CFT creation script#

The following example shows a full command to create a cluster with the AWS CLI:

aws cloudformation create-stack \
--stack-name "SEP-cluster" \
--template-url "https://s3.amazonaws.com/awsmp-fulfillment-cf-templates-prod/StarburstCFT.template" \
--parameters \
"ParameterKey=VPC,ParameterValue=vpc-4bd6ca11" \
"ParameterKey=Subnet,ParameterValue=subnet-123abc2b" \
"ParameterKey=SecurityGroups,ParameterValue=sg-12e34aeb" \
"ParameterKey=CoordinatorInstanceType,ParameterValue=r4.xlarge" \
"ParameterKey=WorkersInstanceType,ParameterValue=r4.xlarge" \
"ParameterKey=KeyName,ParameterValue=john.smith" \
"ParameterKey=IamInstanceProfile,ParameterValue=my-ec2-instance-profile" \
"ParameterKey=WorkersCount,ParameterValue=2" \
"ParameterKey=LaunchSuperset,ParameterValue=yes" \
"ParameterKey=MetastoreType,ParameterValue='External MySQL RDBS'" \
"ParameterKey=ExternalMetastoreHost,ParameterValue=172.31.6.18"  \
"ParameterKey=ExternalMetastorePort,ParameterValue=3306"  \
"ParameterKey=ExternalRdbmsMetastoreUserName,ParameterValue=hive"  \
"ParameterKey=ExternalRdbmsMetastorePassword,ParameterValue='q@55vv0r|>'"  \
"ParameterKey=ExternalRdbmsMetastoreDatabaseName,ParameterValue=hive_metastore"  \
"ParameterKey=AdditionalCoordinatorConfigurationURI,ParameterValue=s3://my_bucket/starburst-additional-coordinator-configuration-1.0.zip" \
"ParameterKey=AdditionalWorkersConfigurationURI,ParameterValue=s3://my_bucket/starburst-additional-workers-configuration-1.0.zip" \
"ParameterKey=BootstrapScriptURI,ParameterValue=s3://my_bucket/starburst-bootstrap-1.0.sh" \
"ParameterKey=LicenseURI,ParameterValue=s3://my_bucket/starburstdata.license" \
"ParameterKey=S3Endpoint,ParameterValue=https://mybucket.s3-us-west-2.amazonaws.com" \
"ParameterKey=S3AccessKey,ParameterValue=AKIAIOSFODNN7EXAMPLE" \
"ParameterKey=S3SecretKey,ParameterValue=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
--capabilities CAPABILITY_IAM

The above commands yields output like the following:

{
  "StackId":"arn:aws:cloudformation:us-east-1:123456789012:stack/myteststack/466df9e0-0dff-08e3-8e2f-5088487c4896"
}

Write a bootstrap script#

Bootstrap scripts are a powerful tool that can streamline cluster provisioning and maintenance. Commonly, these scripts are used to:

  • Download and place plugin resources such as additional connectors or UDFs

  • Modify the node’s filesystem with additional directories

  • Place SEP configuration files that are not handled via the AdditionalCoordinatorConfigurationURI and AdditionalWorkersConfigurationURI CFT parameters

The following example demonstrates a coordinator node check followed by a coordinator-specific modification. Then, the script downloads a plugin resource and creates a directory.

#!/usr/bin/env bash
if [[ "$1" == "coordinator" ]]; then
  echo "io.trino.security=DEBUG" >> /usr/lib/starburst/etc/log.properties
fi

sudo aws s3 cp s3://mybucket/starburst/config/json-serde-jar-with-dependencies.jar /usr/lib/starburst/plugin/hive/

sudo mkdir -p /var/log/starburst/dev/

The following steps are necessary to use a script:

  • Write the script as desired

  • Place the script file in an S3 bucket

  • Set the BootstrapScriptURI parameter to the correct S3 path of the file

The bootstrap script is executed on all nodes after instance provisioning, but before starting SEP.

A best practice is to update the name of the file with a version, an enumerator or a date value with each change.

Expose secrets to SEP#

You can expose secrets such as usernames and password to SEP by using environment variables.

The AWS secrets manager (ASM) stores and manages secrets.

Create a secrets document in ASM using the following format:

{
  "secrets": [
    {
      "name": "secretName0",
      "value": "secretValue0",
      "description": "secretDescription0"
    },
    {
      "name": "secretName1",
      "value": "secretValue1",
      "description": "secretDescription1"
    },
  ]
}

Write a small script to retrieve the file from ASM, extract the secrets and expose them as environment variables.

#!/usr/bin/env bash
# A simple script utilizing the AWS CLI v2 and ASM
#
# $1 = id of the secret in ASM, which is a JSON document
#

aws secretsmanager get-secret-value \
--secret-id ${1} \
--query SecretString --output text | \
jq -r '.secrets[] | "export " + .name + "=\"" + .value + "\""'

The script produces the following output with the above secrets file.

export secretName0=secretValue0
export secretName1=secretValue1

Embed the commands from the script into a bootstrap script. Ensure that your EC2 machines are granted access to ASM and run it within the same region.

As a result, your secrets are exposed as environment variables and can therefore be used in catalog files and wherever else you want to use secrets.

You can also use the script manually for testing. Call the script, named $HOME/bin/secrets-as-envars in this example, directly or from a shell startup script, such as $HOME/.bashrc:

eval $($HOME/bin/secrets-as-envars)

Uninstalling SEP#

Deleting a cluster on the AWS CloudFormation console#

1. Select the stack

From the list of clusters (CloudFormation stacks) within the AWS CloudFormation Console, select the cluster you intend to uninstall. Note that the cluster must currently be running in order to delete it.

2. Actions menu

Navigate to the Actions drop down menu and click Delete stack.

3. Confirm the delete action

When prompted with the option to confirm or cancel the delete request, click Yes, Delete to proceed with the uninstallation of your SEP cluster.

Once stack deletion has begun, you cannot terminate the process. The cluster proceeds to the DELETE_IN_PROGRESS state. After the cluster deletion is complete, the cluster is in the DELETE_COMPLETE state. Clusters in the DELETE_COMPLETE state are not displayed in the AWS CloudFormation console by default. To display deleted clusters, you must change the stack view setting. If the deletion fails, the cluster is in the DELETE_FAILED state.

For information on how to fix failed delete actions on the AWS CloudFormation Console, navigate to the DELETE STACK FAILS section within the AWS CloudFormation troubleshooting documentation.

Deleting a cluster in the AWS CLI#

To delete a cluster within the AWS CLI, you must run the AWS CloudFormation delete-stack command. This process requires you to identify the name of the stack you intend to delete.

Note

When you delete an SEP cluster, you delete the entire cluster and all of its resources. Moreover, you can not delete a cluster in which termination protection is enabled.

Reference the following example:

aws cloudformation delete-stack --stack-name mystarburststack

Safeguarding your cluster#

You can prevent your cluster from being accidentally deleted by enabling termination protection. See safeguarding your stacks.