Insights query details#

Query details are available in Starburst Insights for any query in the Insights query history. Click on a Query ID in the display to access the details for a particular query.

The Query details tab displays either general or advanced information about a specific query, depending on the selected tab. Use the Back button on the top right hand corner of the view to navigate to the query history.

General#

The general view provides higher level information about the query processing.

Query text#

The text area allows you to view the SQL statement as submitted to the SEP cluster for processing. You can copy the text with the copy icon, or maximize the text area with the window icon. You can re-run the query in Worksheets by clicking the >_ icon.

Query details text area

Basic information#

The displayed metadata about the query processing includes the following information:

  • ID

  • User

  • Principal

  • User groups

  • Client

  • Client address

  • CPU time

  • Elapsed time

Query execution#

The Query execution diagram displays performance and timing data about query processing:

Query execution diagram

Cluster resources#

The cluster resources view provides information about the cluster load during the time when the query was queued and executing. Query execution times are influenced by the available resources (CPU, memory), and the number of concurrent queries. This view provides you with the context necessary to interpret query execution data.

Query cluster resources

The charts included in this view are similar to the ones on the cluster history pane. They additionally show the time range when the query ran as a highlighted area on the timeline.

Stages#

The stages view shows a table of expandable information panels with details about query processing stages. A stage is the runtime incarnation of a plan fragment in a distributed query plan, and is identified by a stage number. The maximum number of stages in a query plan is defined by the query.max-stage-count configuration property.

Query details Stages pane

Caution

This view is for advanced users with a very good understanding of query processing. The following materials can help you begin to understand the necessary concepts:

Information is included in the view that applies to the query and its processing as a whole:

  • User: The user that is running the query.

  • Query state: Queued, planning, starting, running, blocked, finishing, finished, failed.

  • CPU time: Total CPU time used by this query, including the cumulative time used by all tasks. Can exceed wall time.

  • Elapsed time: The elapsed wall time for the query.

  • Active percentage: The percentage of CPU time out of the total scheduled time, which includes both CPU time and waiting times. This metric estimates the query’s processing efficiency. Ideally, the value should be a high percentage. A low percentage indicates the query is spending a long time reading from storage.

Query statuses are dependent on the statuses of its stages and tasks. Generally, queries start in the QUEUED state, and move on briefly to the PLANNING state. Queries may return to QUEUED state if they must briefly wait for resources. If the resources are not available fast enough, the time period depending on a myriad of configuration properties, it moves into a BLOCKED state. If at least one worker task is running and nothing is blocked, the query is in the RUNNING state.

Note

The PLANNING stage should be very brief for most queries. If you notice that queries are spending more than a brief amount of time in the PLANNING stage consistently, your cluster may have insufficient resources.

Stages encompass all the tasks of the work described by the stage’s plan fragment. Stages can be in one of several states:

  • Planned: The coordinator created the query plan, and from that, the distributed query plan.

  • Scheduling: The coordinator is scheduling tasks across workers.

  • Scheduling splits: The coordinator is scheduling tasks on workers using the list of splits.

  • Scheduled: All workers have tasks, and the coordinator is waiting for the first task to start running.

  • Running: There is at least one running task.

  • Flushing: Stage has finished executing and output being consumed. At least one of the tasks is flushing and the non-flushing tasks are finished.

  • Finished: The query has finished executing and all output has been consumed.

  • Canceled: The query was canceled by the user.

  • Aborted: Stage was aborted by the coordinator.

  • Failed: Query execution failed.

Several similar statistics are shown for each stage as a whole:

  • State

  • CPU time

  • Active percentage

  • Input buffer

  • Number of tasks

Each stage can be expanded or collapsed with the chevron in its upper right corner.

Tasks#

Stages can have multiple tasks, which have their own row with separate statistics and statuses in the stage panel. Tasks are assigned to workers to be executed in parallel. Tasks can be in one of the following states:

  • Planned: The plan has been created by the coordinator.

  • Running: The task is running.

  • Flushing: The task is flushing results after having been consumed.

  • Finished: The task is complete and its data passed on to the next split.

  • Canceled: The task was canceled by the coordinator due to a user action.

  • Aborted: The task was aborted by the worker or coordinator.

  • Failed: The task failed to execute.

The following per-task statistics are displayed:

  • Host: URI of the node processing the task

  • State: State of the task

  • Queued splits: Number of splits in the task not yet running. This value fluctuates as the task runs.

  • Running splits: Number of splits in the task not yet complete. This value fluctuates as the task runs.

  • Blocked splits: Number of blocked splits in the task. This value fluctuates as the task runs.

  • Completed splits: Number of completed splits in the task

  • Rows: Number of rows read or written by the task; the number increases as the task runs.

  • Rows per second: Rate at which rows are read or written by the task

  • Bytes: Number of bytes read or written by the task; the number increases as the task runs.

  • Bytes per second: Rate at which bytes are read or written by the task

  • Elapsed: Time elapsed wall time since the task creation time

  • CPU time: CPU time used by this task

  • Buffered: Amount of data waiting to be sent to the next processing stage (high values may indicate that the upstream stage is slow)

Advanced#

The advanced view provides lower level details about the query processing.

Query details Advanced pane

Tables#

The table shows a list of all tables accessed by the query including the following information:

  • Table

  • Schema

  • Catalog

  • Rows processed

  • Bytes processed

  • Read or write operation

Query plan#

The query plan used for processing the query on the cluster. It can be copied or maximized using the icons.

Execution details#

You can see the following timing and performance details about query processing on the cluster:

  • Elapsed time

  • CPU time

  • Parallelism

  • Active

  • Rows read

  • Bytes read

  • Rows written

  • Bytes written

  • Peak memory