Getting started#

Starburst Admin is a collection of Ansible playbooks for installing and managing Starburst Enterprise platform (SEP) or Trino clusters.

Note

Current release: Starburst Admin 1.14.0.

Starburst Admin includes the following features:

Installation and upgrade of Starburst Enterprise platform (SEP) or Trino using the RPM or tar.gz archives, including the bundled Eclipse Temurin OpenJDK distribution from Adoptium
Update the coordinator and worker nodes configuration files, including catalog properties files for data source configuration
Service management of the cluster and on all nodes (start/stop/restart/status)
Collection of logs and Java thread dumps
Support for adding custom binary files, such as custom connectors or UDF
Support for adding custom configuration files

All target machines must meet the requirements outlined below prior to installing Starburst Admin.

Starburst Admin does not manage the creation of the servers, the operating system installation and configuration, or the Python installation. It is also not designed to manage other related tools such as Apache Ranger, a Hive Metastore Service, or any data source.

It is most suitable for managing clusters installed on bare-metal servers or virtual machines. Use the Kubernetes with Helm support instead of Starburst Admin if you use containers and Kubernetes.

Note

The legacy Presto Admin is deprecated, and no longer supported for Starburst Enterprise version 354-e and higher.

Requirements#

Deep knowledge of Ansible is not expected for usage, but familiarity with Ansible and Ansible playbooks is required.

The following sections detail the requirements for the machine where you run Ansible and Starburst Admin, called the control node, and the requirements for the machines where you install and manage SEP, called the cluster nodes.

Requirements for the control node#

The control node is used to run Starburst Admin, and therefore Ansible playbooks. Standard Ansible requirements apply:

Ansible 2.10 or higher
Linux/Unix operating system
Python 3.5 and higher

In addition, the following resources are needed:

x86_64 (AMD64) or AArch64 (ARM64) processor architecture
SSH connectivity to the cluster nodes
Downloaded Starburst Enterprise platform (SEP) or Trino tar.gz or RPM archive files on the control node, or alternatively URL to the files that is accessible on all cluster nodes

The controller node can be any machine that is configured to fulfill these requirements. For initial testing you can use your workstation or even a node in the cluster directly. Production usage should follow Ansible best practices, and use dedicated workflow or Ansible orchestration and automation tools such as Ansible Tower or Concord.

Requirements for managed cluster nodes#

Starburst Admin does not manage the cluster hardware, operating system or package installation. It relies on the existence of all the nodes in the cluster and the fact they fulfill the requirements detailed in this section.

Typically provisioning systems such as Puppet, Chef, Terraform and others are used to prepare the cluster nodes. All cluster nodes need to fulfill the normal Starburst Enterprise platform (SEP) requirements, including a supported version of RedHat Enterprise Linux.

Memory and hardware resource requirements depend on the planned capacity of the cluster. Following are a few high level guidelines:

Use identical hardware configurations for all workers.
Start with at least two workers, scale up as needed.
Prefer fewer, more powerful worker nodes over many smaller ones.
For performance reasons, nodes are ideally located on the same subnet and within the same data center. All nodes communicate using TCP/IP.

Specific testing is performed with Red Hat Enterprise Linux (RHEL) versions 7, 8, and 9. Use other 64-bit Linux distributions at your own risk.

Additional requirements:

Enabled SSH access and connectivity from the control node, the configured user must have root or sudo access. If the sudo user requires a password, use ask-become-pass when running playbooks. Alternatively, Starburst Admin can be installed as a non-root user, as long as the non-root requirements are satisfied.
rsync, often an optional package that needs to be installed.
bash, typically installed by default.

The following requirements must be met in order to install Starburst as a non-root user:

The user that Ansible uses to SSH into the target nodes must already exist on the target node, prior to running any playbooks.
The base directory that SEP is installed into, such as /opt/starburst, must already exist on the target node and the Ansible user must have read and write access to that directory.
The directory variables in the playbooks/vars.yml file must reflect the SEP base directory, as described in the Installation guide.

When using Starburst Admin with an RPM archive:

An RPM-based Linux distribution is required.
The rpm command, yum, dnf, and other similar commands are not required.
RPM-based install is not allowed when installing Starburst Admin as non-root, because the directories it installs into require root access.

When Starburst Admin with an tar.gz archive:

GNU tar command
unzip command

Install Starburst Admin on the control node#

Starburst Admin is a collection of Ansible playbooks that you install on the control node:

Contact Starburst Support for the Starburst Admin tar.gz binary package. Alternatively, if you have access to the Starburst Admin repository, download the tar.gz file for the latest release tag.
Move it onto the control node into any directory, such as ~/tmp.
Access the directory in a command line interface.

Install the collection with the following command:

ansible-galaxy collection install starburst-admin-*.tar.gz

Confirm the command finishes successfully:

Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'starburst.admin:1.14.0' to '....'
starburst.admin:1.14.0 was installed successfully

The collection is installed into /home/<username>/.ansible/collections by default. The installation path /home/<username>/.ansible/collections/ansible_collections/starburst/admin/files is used for the binaries and all the configuration files for a cluster. Make sure you manage the files in this directory with a version control system.

You can override the installation path with the option -p <installation-path>.

Install on multiple control nodes#

If you need to install the collection into numerous control nodes, you can make the binary available on a remote URL:

Make the binary available on a server via HTTP, for example, https://repo.example.com/files/starburst-admin-1.14.0.tar.gz.

Create a file requirements.yml that includes a link to the binary.

---
collections:
    # Example link to tar.gz package
    - https://repo.example.com/files/starburst-admin-1.14.0.tar.gz

Use the YAML file for the installation

ansible-galaxy collection install -r requirements.yml

Next steps#

Now that you have set up the control nodes and the managed cluster nodes, you can proceed with the initial installation on the cluster. Before proceeding with the installation, review the additional services that may be required, as discussed in the following sections.

Automated memory configuration#

The amount of memory that SEP consumes can be fine-tuned by adjusting settings in the jvm.config and config.properties files. Starburst Admin has a memory_auto_config parameter section which provides a convenient way to set these memory-related configurations for you, based on a few questions.

Hive Metastore Service or AWS Glue#

SEP must be configured to work with an existing Hive Metastore Service or AWS Glue if any of the following are true:

If one of your catalogs such as Hive, Delta Lake, or Iceberg needs a metastore configured.
You are configuring the cache service.
You are enabling data products.

Backend service#

You must enable and configure the SEP backend service, which requires an externally-managed database to be available. Read more about this in our backend service topic.

Cache service#

The Starburst cache service provides the ability to configure and automate the management of table scan redirections and materialized views in supported connectors. It is disabled by default.

All Starburst Admin deployments enabling the cache service must use the cache service in embedded mode.

Data products#

Data products provides a collection of curated, high-quality related datasets and relevant metadata for important data in your organization. You must configure SEP to enable data products.

Insights#

Recent query and cluster activity are enabled by default; however, you must explicitly configure SEP to enable query history and usage metrics.