Starburst Galaxy

  •  Get started

  •  Working with data

  •  AI workflows

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Security and compliance

  •  Troubleshooting

  • Galaxy status

  •  Reference

  • Client protocol #

    Starburst Galaxy uses the Trino client protocol, which is an HTTP-based protocol that enables clients to submit SQL queries and receive results. The protocol is a sequence of REST API calls to the coordinator of the Galaxy cluster. The following is a high-level overview of the process:

    1. The client submits SQL query text to the coordinator of the cluster.
    2. The coordinator starts processing the query.
    3. The coordinator returns a result set and a URI nextUri on the coordinator.
    4. The client receives the result set and initiates another request for more data from the URI nextUri.
    5. The coordinator continues processing the query and returns further data with a new URI.
    6. The client and coordinator continue with steps 4 and 5 until all result set data is returned to the client, or the client stops requesting more data.
    7. If the client fails to fetch the result set, the coordinator does not initiate further processing and fails the query.
    8. The final response when the query is complete is FINISHED.

    The client protocol has two modes of operation: direct and spooling.

    Spooling protocol #

    The spooling protocol uses an object storage location to store the data for retrieval by the client. The workers write result set data to the storage in parallel. The coordinator only provides to the client the URLs to all the individual data segments on the object storage. The spooling protocol provides compressed variants of the JSON serialization format.

    The spooling protocol has the following characteristics, compared to the direct protocol:

    • Provides higher throughput for data transfer, specifically for queries that return more data.
    • Results in faster query processing completion on the cluster, independent of the client retrieving all data, since data is read from the object storage.
    • Reduces CPU and I/O load on the coordinator.
    • Requires object storage.
    • Requires newer client drivers or client applications that support the spooling protocol and actively request usage of the spooling protocol.
    • Clients must have access to the object storage.
    • Works with older client drivers and client applications by automatically falling back to the direct protocol if the spooling protocol is not supported.

    The latest client libraries that are compatible with the spooling protocol default to spooling. If there is no benefit to spooling, Galaxy falls back to the direct client protocol.

    The following client drivers and applications support the spooling protocol:

    The spooling protocol is supported for Amazon S3 catalogs.

    Direct protocol #

    The direct protocol transfers all data from the workers to the coordinator, and from there directly to the client.

    The direct protocol has the following characteristics, compared to the spooling protocol:

    • Provides lower performance, specifically for queries that return more data.
    • Results in slower query processing completion on the cluster, since data is provided by the coordinator and read by the client sequentially.
    • Requires no object storage.
    • Increases CPU and I/O load on the coordinator.
    • Works with older client drivers and client applications without support for the spooling protocol.