General properties#
All properties described in this page are defined as follows, depending on the deployment type:
Kubernetes: In the
additionalProperties
section of the the top-levelcoordinator
andworker
nodes in thevalues.yaml
file.Starburst Admin: In the
files/coordinator/config.properties.j2
andfiles/worker/config.properties.j2
files.
join-distribution-type
#
Type: string
Allowed values:
AUTOMATIC
,PARTITIONED
,BROADCAST
Default value:
AUTOMATIC
Session property:
join_distribution_type
The type of distributed join to use. When set to PARTITIONED
, SEP uses
hash distributed joins. When set to BROADCAST
, it broadcasts the right
table to all nodes in the cluster that have data from the left table.
Partitioned joins require redistributing both tables using a hash of the join
key. This can be slower, sometimes substantially, than broadcast joins, but
allows much larger joins. In particular broadcast joins are faster, if the right
table is much smaller than the left. However, broadcast joins require that the
tables on the right side of the join after filtering fit in memory on each node,
whereas distributed joins only need to fit in distributed memory across all
nodes. When set to AUTOMATIC
, SEP makes a cost based decision as to which
distribution type is optimal. It considers switching the left and right inputs
to the join. In AUTOMATIC
mode, SEP defaults to hash distributed joins if
no cost could be computed, such as if the tables do not have statistics.
redistribute-writes
#
Type: boolean
Default value:
true
Session property:
redistribute_writes
This property enables redistribution of data before writing. This can eliminate the performance impact of data skew when writing by hashing it across nodes in the cluster. It can be disabled, when it is known that the output data set is not skewed, in order to avoid the overhead of hashing and redistributing all the data across the network.
protocol.v1.alternate-header-name
#
Type: string
The 351 release of Trino changes the HTTP client protocol headers to start with
X-Trino-
. Clients for versions 350 and lower expect the HTTP headers to
start with X-Presto-
, while newer clients expect X-Trino-
. You can support these
older clients by setting this property to Presto
.
The preferred approach to migrating from versions earlier than 351 is to update all clients together with the release, or immediately afterwards, and then remove usage of this property.
Ensure to use this only as a temporary measure to assist in your migration efforts.
protocol.v1.prepared-statement-compression.length-threshold
#
Type: integer
Default value:
2048
Prepared statements that are submitted to SEP for processing, and are longer than the value of this property, are compressed for transport via the HTTP header to improve handling, and to avoid failures due to hitting HTTP header size limits.
protocol.v1.prepared-statement-compression.min-gain
#
Type: integer
Default value:
512
Prepared statement compression is not applied if the size gain is less than the configured value. Smaller statements do not benefit from compression, and are left uncompressed.
File compression and decompression#
SEP uses the aircompressor library to compress and decompress ORC, Parquet, and other files using the LZ4, zstd, Snappy, and other algorithms. The library takes advantage of using embedded, higher performing, native implementations for these algorithms by default.
If necessary, this behavior can be deactivated to fall back on JVM-based implementations with the following configuration in the JVM config:
-Dio.airlift.compress.v3.disable-native=true
The library relies on the temporary directory used by the JVM,
including the execution of code in the directory, to load the embedded shared
libraries. If this directory is mounted with noexec
, and therefore not
suitable, you can configure usage of a separate directory with an absolute path
set with the following configuration in the JVM config:
-Daircompressor.tmpdir=/mnt/example
starburst.config-validation.enable
#
Type: boolean
Default value:
true
Enables configuration validation, which checks cluster and catalog configuration
files for severe misconfigurations that may cause unwanted behavior. If a
validation rule is violated, the cluster fails to start with an
io.airlift.bootstrap.ApplicationConfigurationException: Configuration errors
error message followed by a list describing the validation rules that failed.