Optimizing query performance #
Starburst Enterprise platform (SEP) is fast. But did you know that there are still many opportunities to make it even faster depending on how you write your queries?
Learn how to use
ANALYZE to improve your query performance in
this training video presented by one of our founders, Martin Traverso. For your
convenience, we’ve divided the video training course up into topic sections, and
provided links to the relevant parts of our documentation below.
The query lifecycle #
Knowing what’s happening under the hood in SQL can help you to write queries that capitalize on possible optimizations and avoid approaches that will cost you performance. This section provides an overview of what happens as a query is executed.
- Scheduling and execution
The EXPLAIN statement in detail #
If you want to understand what the SEP engine is basing its decisions on
as it executes a query, you need to use the
EXPLAIN statement. This section
walks you through this very informative tool in detail.
- EXPLAIN vs EXPLAIN ANALYZE
- Fragment structure, distribution, row layout, estimates, and performance stats in EXPLAIN ANALYZE
General optimizations #
The content in this section is more technique-oriented, and is a complex subject. We strongly suggest watching it all the way through thoroughly first to gain a broad awareness of how you write a query can affect its performance before trying these on your own. For further reading, we recommend our pushdown documentation.
The SQL engine relies on table statistics to make decisions on optimizations. Enabling dynamic filtering can take optimizations even further. We recommend reading about these powerful features to ensure you are getting the best performance possible out of your cluster:
- Constant folding
- Predicate pushdown
- Predicate pushdown into the Hive connector
- Hive partition pruning
- Hive bucket pruning
- Row group skipping for ORC and Parquet
- Limit, partial limit, and aggregation pushdown
SEP offers several properties to control how the optimizer handles certain operations.
Cost-based optimizations #
This section presents on overview of how cost-based optimizations work in SEP, and provides great context for the following recommended reading:
- Partitioned and broadcast joins
- Disabling cost-based optimizations
- Join reordering
- Table statistics
- Computing statistics with ANALYZE