Client protocol #
Starburst Galaxy uses the Trino client protocol, which is an HTTP-based
protocol that enables clients to submit SQL queries and receive results. The
protocol is a sequence of REST API calls to the coordinator of the
Galaxy cluster. The following is a high-level overview of the process:
- The client submits SQL query text to the coordinator of the cluster.
- The coordinator starts processing the query.
- The coordinator returns a result set and a URI
nextUri
on the coordinator.
- The client receives the result set and initiates another request for more
data from the URI
nextUri
.
- The coordinator continues processing the query and returns further data with
a new URI.
- The client and coordinator continue with steps 4 and 5 until all result set
data is returned to the client, or the client stops requesting more data.
- If the client fails to fetch the result set, the coordinator does not
initiate further processing and fails the query.
- The final response when the query is complete is FINISHED.
The client protocol has two modes of operation: direct and spooling.
Spooling protocol #
The spooling protocol uses an object storage location to store the data for
retrieval by the client. The workers write result set data to the storage in
parallel. The coordinator only provides to the client the URLs to all the
individual data segments on the object storage. The spooling protocol provides
compressed variants of the JSON serialization format.
The spooling protocol has the following characteristics, compared to the direct
protocol:
- Provides higher throughput for data transfer, specifically for queries
that return more data.
- Results in faster query processing completion on the cluster, independent of
the client retrieving all data, since data is read from the object
storage.
- Reduces CPU and I/O load on the coordinator.
- Requires object storage.
- Requires newer client drivers or client applications that support the spooling
protocol and actively request usage of the spooling protocol.
- Clients must have access to the object storage.
- Works with older client drivers and client applications by automatically
falling back to the direct protocol if the spooling protocol is not
supported.
The latest client libraries that are compatible with the spooling protocol
default to spooling. If there is no benefit to spooling, Galaxy falls back
to the direct client protocol.
The following client drivers and applications support the spooling protocol:
The spooling protocol is supported for Amazon S3 catalogs.
Direct protocol #
The direct protocol transfers all data from the workers to the coordinator, and
from there directly to the client.
The direct protocol has the following characteristics, compared to the spooling
protocol:
- Provides lower performance, specifically for queries that return more data.
- Results in slower query processing completion on the cluster, since data is
provided by the coordinator and read by the client sequentially.
- Requires no object storage.
- Increases CPU and I/O load on the coordinator.
- Works with older client drivers and client applications without support for
the spooling protocol.
Is the information on this page helpful?