Starburst Galaxy

  •  Get started

  •  Working with data

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Security and compliance

  •  Troubleshooting

  • Galaxy status

  •  Reference

  • Galaxy cluster basics #

    In Starburst Galaxy, a cluster provides the resources to run queries against numerous catalogs. You can access the data exposed by the catalogs with the query editor or other clients.

    Access your account’s clusters from the navigaton menu by clicking Admin > Clusters.

    Newly created Galaxy accounts typically have an example cluster with the name sample for accounts created before August 30, 2023 or free-cluster for newer accounts.

    Starburst Galaxy lets you create, edit, delete, enable, and disable clusters, and lets you resume an auto-suspended cluster.

    Concepts #

    Creating and managing clusters is an essential task for a platform administrator in Starburst Galaxy. A cluster with the desired catalogs is required for a data consumer to use SQL statements in client tools to analyze the available data. The following important concepts for understanding how to perform this work efficiently.

    Cluster maximums #

    For all clusters, the maximum allowed query processing time in Starburst Galaxy is four hours. Longer running queries are terminated. Find relevant tips in query troubleshooting.

    The number of clusters allowed per account is limited to 30 by default. Contact Starburst support if you need a higher limit.

    To enable a cluster with more than 20 worker nodes (which includes the X-Large and 2X-large cluster sizes), contact Starburst support.

    Cloud provider region and catalogs #

    Catalogs define the connection details to access a data source. Any data source is located in a specific cloud region of a specific cloud provider. For example, your Cloud SQL for MySQL database is hosted in the us-east1 region of Google Cloud.

    A cluster can include one or more catalogs. If multiple catalogs are configured, you can query them with SQL using the same client connection. You can also query the data in multiple catalogs within one SQL statement.

    A cluster and all its configured catalogs are typically located in the same cloud provider and region. This allows for maximum performance and avoids data transfer costs for access across regions.

    Your organization can also query across regions within the same cloud provider. When catalogs are located in different regions, data transfer charges might be incurred.

    Size and scaling #

    The size of a cluster determines the number of server nodes, including one coordinator and many workers, used to process queries. A larger cluster, consisting of more nodes, is capable of processing more complex queries, handling more concurrent users, and providing higher performance by using more resources.

    The available sizes include Free, X-Small, Small, Medium, Large, X-Large, 2X-Large, and Custom. You can create a cluster with any size, and change size based on the current needs. Changing the size requires a restart of the cluster. All nodes in a cluster are identical. Best practice is to start with a smaller size cluster and determine whether the cluster is capable of processing all queries in your workload. Slow processing or running out of memory failures typically suggest choosing a larger size.

    Reference this list to determine the correct cluster size for the workload. All cluster sizes have one coordinator node and the listed number of worker nodes.

    Size Worker count Notes
    X-small 2
    Small 4
    Medium 8
    Large 16
    X-Large 32 Contact Support to enable.
    2X-Large 64 Contact Support to enable.
    Custom The Custom cluster size enables autoscaling by allowing separate user-defined minimum and maximum values between 1 and 20, inclusive, such as min=4, max=12.

    See when to use autoscaling.

    You can also use Custom to configure a non-autoscaling intermediate sized cluster by specifying the same minimum and maximum value, such as min=20, max=20.
    Free 1 Shares a node with the coordinator.

    Learn more about configuring autoscaling on a new or existing cluster.

    Cluster status and transitions #

    A cluster can be in one of the following states:

    Not enabled
    A cluster that is not enabled consists of a small configuration set only. No significant resources are used, and no costs are incurred.
    Starting
    A cluster currently entering the running state.
    Running
    A running cluster consists of a number of server nodes. It continues to be in the running state, while users are submitting queries for processing.
    Suspended
    A suspended cluster consists of a small configuration set, and a mechanism to listen to incoming user request. It does not include any actively running server nodes, and no costs are incurred.

    Configuration changes to Galaxy catalogs are implemented immediately for suspended or disabled clusters. For running clusters, catalog changes are saved and implemented when you run the next query. For some cluster conditions, after you make catalog configuration changes, Galaxy may show a dialog asking you to manually stop and restart the cluster.

    A newly created cluster begins un-enabled, and can be enabled in the cluster list.

    A running cluster can be manually disabled in the cluster list.

    Uptime #

    A running cluster becomes idle when no queries are submitted and all processing of queries is completed. Idle clusters automatically transition to suspended status when the configured auto-suspend time is reached. Available auto-suspend times include 1 minute, 5 minutes, 15 minutes, 30 minutes, and 1 hour.

    When a user submits a query to a suspended cluster, the cluster is started, and the query is processed. The user must wait for the cluster to start, which typically takes between one and five minutes.

    You can also configure a cluster to Never suspend. This causes the cluster to remain up and running, even if no queries are processed and the cluster is idling. The advantage of this behavior is that any issued query can be processed immediately, as there is no wait time until the cluster started. The disadvantage is the increased cost incurred for continuously running a cluster. The Never suspend option is not available for free clusters. By default, an accelerated cluster is configured to Never suspend because restarting a suspended cluster requires warming up the cache again and recreating indexes.

    Auto suspend tooltip for accelerated cluster

    Use cluster scheduling to transition clusters between running and suspended status automatically, based on specified days and times.

    Execution mode #

    When configuring your cluster, choose between Standard, Fault tolerant, and Accelerated listed in the Execution mode drop-down menu.

    Cluster type execution mode drop-down menu expanded

    Learn more about the three different execution modes that Starburst Galaxy has to offer.

    Query result caching #

    For all cluster sizes except Free, you can optionally set the cluster to cache query results for a specified period of time. For more information, see Query result caching.