Optimizing query performance #

Starburst Galaxy and Starburst Enterprise platform (SEP) are fast, but there are still many opportunities to make them even faster, depending on how you write your queries.

Learn how to use EXPLAIN and ANALYZE to improve your query performance in this training video presented by one of our founders, Martin Traverso. For your convenience, we’ve divided the video training course up into topic sections, and provided links to the relevant parts of our documentation below.

The query lifecycle #

Knowing what’s happening under the hood in SQL can help you to write queries that capitalize on possible optimizations and avoid approaches that will cost you performance. This section provides an overview of what happens as a query is executed.

  • Parsing
  • Analysis
  • Planning
  • Optimization
  • Scheduling and execution

Running time: ~12 min.

The EXPLAIN statement in detail #

If you want to understand what the Trino query engine is basing its decisions on as it executes a query, use the EXPLAIN statement. This section walks you through this very informative tool in detail.

  • EXPLAIN (Starburst Galaxy or SEP )
  • EXPLAIN vs EXPLAIN ANALYZE (Starburst Galaxy or SEP )
  • Fragment structure, distribution, row layout, estimates, and performance stats in EXPLAIN ANALYZE (SEP )
  • Exchanges (SEP )
Click the links to read more on that topic in our reference manuals. Some considerations, such as exchanges, are handled for you by Starburst Galaxy.

Running time: ~20 min.

General optimizations #

The content in this section is more technique-oriented, and is a complex subject. We strongly suggest watching it all the way through thoroughly first to gain a broad awareness of how you write a query can affect its performance before trying these on your own. For further reading, we recommend our SEP pushdown documentation.

The SQL engine relies on table statistics to make decisions on optimizations. Enabling dynamic filtering can take optimizations even further. We recommend reading about these powerful features to ensure you are getting the best performance possible out of your SEP cluster. With Starburst Galaxy, this is handled for you:

  • Constant folding
  • Predicate pushdown
  • Predicate pushdown into the Hive connector
  • Hive partition pruning
  • Hive bucket pruning
  • Row group skipping for ORC and Parquet
  • Limit, partial limit, and aggregation pushdown
  • Skew

Running time: ~58 min.

SEP offers several properties to control how the optimizer handles certain operations. With Starburst Galaxy, this is handled for you.

Cost-based optimizations #

This section presents on overview of how cost-based optimizations work in Starburst clusters, and provides context for the following recommended reading:

  • Partitioned and broadcast joins
  • Disabling cost-based optimizations
  • Join reordering
  • Table statistics
  • Computing statistics with ANALYZE (Starburst Galaxy or SEP )

Running time: ~13 min.