Clusters #

A cluster in Starburst Galaxy provides the resources to run queries against numerous catalogs. You can access the data exposed by the catalogs with Query editor or other clients.

Access your clusters at any time by clicking Clusters on the left hand menu. A demo cluster is included by default.

Starburst Galaxy allows you to create, edit, delete, start, and stop clusters.

Concepts #

Creating and managing clusters is an essential task for a platform administrator in Starburst Galaxy. A cluster with the desired catalogs is required for a data consumer to use SQL statements in client tools to analyze the available data. The following concepts are important to perform this work efficiently.

Cloud provider region and catalogs #

Catalogs define the details to access a data source. Any data source is located in a specific cloud region of a specific cloud provider. For example, your Cloud SQL for MySQL database is hosted in the us-east1 region of Google Cloud.

A cluster can include one or more catalogs. If multiple catalogs are configured, you can query them with SQL using the same client connection. You can also query the data in multiple catalogs within one SQL statement.

A cluster and all its configured catalogs must be located in the same cloud provider and region. This allows for maximum performance and avoids data transfer costs for access across regions or even cloud providers.

Size and scaling #

The size of a cluster determines the number of server nodes, including one coordinator and many workers, used to process queries. A larger cluster, consisting of more nodes, is capable of processing more complex queries, handling more concurrent users, and providing higher performance by using more resources.

The available sizes include Free, X-Small, Small, Medium, Large, X-Large, and 2X-Large. You can create a cluster with any size, and change size based on the current needs. Changing size requires a restart of the cluster. All nodes in a cluster are identical. Best practice is to start with a smaller size cluster and determine if the cluster is capable of processing all queries in your workload. Failures in terms of memory, or slow query processing typically indicate that you should choose a larger size.

Clusters automatically scale the number of workers within a range from a minimum to a set maximum number of workers:

Size Minimum number of workers Maximum number of workers
Free 1 1
X-Small 1 2
Small 3 4
Medium 5 8
Large 9 16
X-Large 17 32
XX-Large 33 64

For a defined cluster size with different minimum and maximum values, clusters automatically scale up to the maximum number of workers for the configured cluster size when the combined CPU usage of all workers exceeds 60%. Autoscaling adds one or more workers to get the combined CPU usage of all workers below 60%. The autoscaling process takes approximately four minutes to make the first adjustment. If the CPU usage continues to climb and exceeds 60%, the process repeats until the maximum number of workers is reached.

Clusters automatically scale down to the minimum number of workers when the combined CPU usage of all workers drops below 60%. Autoscaling removes one or more workers until CPU usage approaches 60%. The autoscaling process takes approximately 15 minutes to make the first adjustment.

Cluster status and transitions #

A cluster can be in one of the following states:

A stopped cluster consists of a small configuration set only. No significant resources are used, and no costs are incurred.
A cluster currently entering the running state.
A running cluster consists of a number of server nodes. It continues to be in the running state, while users are submitting queries for processing.
A suspended cluster consists of a small configuration set, and a mechanism to listen to incoming user request. It does not include any actively running server nodes, and no costs are incurred.

A newly created cluster begins in the stopped state, and can be started in the list of clusters.

A running cluster can be manually stopped in the list of clusters.

Idle shutdown time #

A running cluster becomes idle when no queries are submitted and all processing of queries is completed. Idle clusters automatically transition to suspended status when the configured idle shutdown time is reached. Available idle shutdown times include 1 minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour.

When a user submits a query to a suspended cluster, the cluster is started, and the query is processed. The user must wait for the cluster to start, which typically takes between one and five minutes.

You can also configure a cluster to Never suspend. This causes the cluster to remain up and running, even if no queries are processed and the cluster is idling. The advantage of this behavior is that any issued query can be processed immediately as there is no wait time until the cluster started. The disadvantage is the increased cost incurred.

The never suspend option is not available for free clusters.

Use cluster scheduling to transition clusters between running and suspended status automatically, based on specified days and times.

Clusters using the resource intensive query processing mode #

Enable the resource intensive query processing mode to allow you to designate a cluster to operate with fault-tolerant execution. This allows a cluster to retry queries or parts of query processing in the event of failures without having to start the whole query from the beginning. This is especially useful for long-running queries that are typical with batch processing and Extract Transform Load (ETL) queries.

In resource intensive query processing mode, intermediate exchange data is spooled and can be re-used by another worker. When queries require more memory than currently available in the cluster they are still able to succeed. Multiple queries are able to share resources in a fair way, and make steady progress.

Do not use resource intensive query processing mode if most queries in the cluster are short-running, typically less than one minute for completion, and require smaller amounts of memory. Query processing in resource-intensive mode can be slightly slower than normal operation.

In Starburst Galaxy, you make a cluster run in resource intensive query processing mode with a simple execution mode toggle, and with no other configuration. You can make this designation either when creating a cluster or afterwards. To take a cluster back to standard processing, revert the toggle and restart the cluster. Resource intensive query processing is not available for the Free cluster size.

Cluster resource-intensive mode selection

Fault-tolerant execution is not designed to recover from broken queries or incorrect SQL.

Cluster considerations #

The following additional characteristics apply to all clusters:

  • The maximum allowed query processing time is four hours. Longer running queries are terminated. Find relevant tips in the query troubleshooting section.

Manage clusters #

You can create, view, and manage clusters in the View clusters pane.

  View clusters

Create a cluster #

Make sure that you have configured the desired catalogs to avoid restarts before creating a cluster. To create a cluster, click Create cluster and proceed with the following steps:

  1. Provide a meaningful name for users as the Cluster name.
  2. Add one or more Catalogs to provide access to the configured data sources in the cluster. The catalogs must use the same cloud provider and region as the cluster itself.
  3. Choose a Cloud provider region for deploying and running the cluster.
  4. Configure the Cluster type from the available sizes.
  5. If desired, click the slider to enable resource intensive query processing mode
  6. Expand Advanced settings.
  7. Configure the Idle shutdown time to enable automatic suspension for when the cluster becomes inactive. You can also choose Never suspend.
  8. Choose roles to grant access to the cluster and configure access with Grant access to users with role(s).
  9. Click Create cluster to save the configuration to the list of clusters.

If desired, proceed to start the cluster and configure a cluster schedule.

List of clusters #

The list of clusters displays the following information about each cluster:

  • Name: The name of the cluster, used to identify a cluster in the user interface as well as in the connection string for clients.
  • Status: The current status of the cluster.
  • Quick actions: Allows you to start, stop, and apply updates a cluster.
  • Catalogs: Lists the configured catalogs used in the cluster.
  • Size: The configured cluster size.
  • Auto suspend The configured time for an inactive cluster to transition to suspended status automatically.
  • Connect: Click Connection info to view and copy connection details and download connection files according to the client.
  • to edit the cluster.
  • beside the column names to sort catalogs in the list.
  • Click to access a drop down of more actions:
    • Query: To navigate to the Query editor using the current cluster as context.
    • Start: To start a stopped cluster.
    • Stop: To stop a running cluster.
    • Resume: To resume a suspended cluster.
    • Edit cluster: To edit the cluster.
    • Change owner To change the owner of the cluster to a different role.
    • View cluster activity To view the cluster activity of a running cluster.
    • Delete cluster: To delete the cluster.

Search clusters #

Use the search field to find clusters by name and catalogs.

Start a cluster #

Click Start in the list of clusters.

A started cluster can automatically transition between running and suspended status depending on a defined cluster schedule.

Stop a cluster #

Click Stop in the list of clusters. To delete a cluster, you must stop it first.

A stopped cluster must be started manually. Any defined cluster schedule does not start a cluster.

Edit a cluster #

To access the Edit cluster panel, click then select the Edit cluster option. You can access the same edit options by clicking in the list of clusters.

You can edit the cluster name, catalogs, cluster type, execution mode, and the idle shutdown time without affecting your users or any running queries. How changes take effect depends on the status of your cluster.

Stopped: Configuration changes for a stopped cluster are applied immediately and displayed in the list of cluster. The new configuration is used when the cluster is started.

Suspended: Configuration changes for a suspended cluster are applied immediately and displayed in the list of clusters. The new configuration is used when the cluster is resumed.

Running: Configuration changes for a running cluster open a dialog prompt offering the following choices:

  • Yes, update now: If you choose this option, queries running on the existing cluster finish, and a new cluster with the updated configuration is used for all new queries.
  • No, update later: If you choose this option, an update button appears under Quick actions in the list of clusters. Click the button once you are ready to apply the changes. Until then, the cluster continues to run using the current configuration. If you have an idle shutdown time configured, the cluster automatically transitions to suspended status once that time arrives. The new configuration is used when the cluster is resumed.

Delete a cluster #

Before you can delete a cluster, you must stop it in the list of clusters. Once stopped, click , then choose Delete cluster. Alternatively, you can access the edit cluster panel and click Delete cluster.