In Starburst Galaxy, a cluster provides the resources to run queries against numerous catalogs. You can access the data exposed by the catalogs with the query editor or other clients.
Access your account’s clusters from the navigaton menu by clicking Admin > Clusters.
Newly created Galaxy accounts typically have an example cluster with the name
sample
for accounts created before August 30, 2023 or free-cluster
for newer
accounts.
Starburst Galaxy lets you create, edit, delete, enable, and disable clusters, and lets you resume an auto-suspended cluster.
Creating and managing clusters is an essential task for a platform administrator in Starburst Galaxy. A cluster with the desired catalogs is required for a data consumer to use SQL statements in client tools to analyze the available data. The following important concepts for understanding how to perform this work efficiently.
For all clusters, the maximum allowed query processing time in Starburst Galaxy is four hours. Longer running queries are terminated. Find relevant tips in query troubleshooting.
The number of clusters allowed per account is limited to 30 by default. Contact Starburst support if you need a higher limit.
To enable a cluster with more than 20 worker nodes (which includes the X-Large and 2X-large cluster sizes), contact Starburst support.
Catalogs define the
connection details to access a data source. Any data source is located in a
specific cloud region of a specific cloud provider. For example, your Cloud SQL
for MySQL database is hosted in the us-east1
region of Google Cloud.
A cluster can include one or more catalogs. If multiple catalogs are configured, you can query them with SQL using the same client connection. You can also query the data in multiple catalogs within one SQL statement.
A cluster and all its configured catalogs are typically located in the same cloud provider and region. This allows for maximum performance and avoids data transfer costs for access across regions.
Your organization can also query across regions within the same cloud provider. When catalogs are located in different regions, data transfer charges might be incurred.
The size of a cluster determines the number of server nodes, including one coordinator and many workers, used to process queries. A larger cluster, consisting of more nodes, is capable of processing more complex queries, handling more concurrent users, and providing higher performance by using more resources.
You can create a cluster with any size, and change size based on the current needs. All nodes in a cluster are identical.
Best practice is to start with a smaller size cluster and determine whether the cluster is capable of processing all queries in your workload. Slow processing or out of memory failures typically suggest choosing a larger size.
Learn more about configuring autoscaling on a new or existing cluster.
A cluster can be in one of the following states:
Configuration changes to Galaxy catalogs are implemented immediately for suspended or disabled clusters. For running clusters, catalog changes are saved and implemented when you run the next query. For some cluster conditions, after you make catalog configuration changes, Galaxy may show a dialog asking you to manually stop and restart the cluster.
A newly created cluster begins un-enabled, and can be enabled in the cluster list.
A running cluster can be manually disabled in the cluster list.
A running cluster becomes idle when no queries are submitted and all processing
of queries is completed. Idle clusters automatically transition to suspended
status when the configured auto-suspend time is reached. Available auto-suspend
times include 1 minute
, 5 minutes
, 15 minutes
, 30 minutes
, and
1 hour
.
When a user submits a query to a suspended cluster, the cluster is started, and the query is processed. The user must wait for the cluster to start, which typically takes between one and five minutes.
You can also configure a cluster to Never suspend. This causes the cluster to remain up and running, even if no queries are processed and the cluster is idling. The advantage of this behavior is that any issued query can be processed immediately, as there is no wait time until the cluster started. The disadvantage is the increased cost incurred for continuously running a cluster. The Never suspend option is not available for free clusters. By default, an accelerated cluster is configured to Never suspend because restarting a suspended cluster requires warming up the cache again and recreating indexes.
Use cluster scheduling to transition clusters between running and suspended status automatically, based on specified days and times.
When configuring your cluster, choose between Standard, Fault tolerant, and Accelerated listed in the Execution mode drop-down menu.
Learn more about the three different execution modes that Starburst Galaxy has to offer.
Free
cluster size.For all cluster sizes except Free
, you can optionally set the cluster to cache
query results for a specified period of time. For more information, see Query
result caching.
Is the information on this page helpful?
Yes
No