Starburst shared connector features#

A number of Starburst Enterprise platform (SEP) connectors share features and behavior. For example, some use JDBC connections to the data source, support identical configuration option, performance improvements, or security features. These are all documented in the following sections and linked from the connectors which implement a specific feature.

Configuration#

Standard JDBC catalog properties#

A common JDBC connector implementation includes a catalog file with the following properties:

# Connector name (required property)
connector.name=generic-jdbc

# Connection URL (required for JDBC connectors)
connection-url=[connectorprotocol]//<host>:<port>

# Connection credentials
connection-user=USERNAME
connection-password=PASSWORD

While the connector.name is the only required property in a catalog file, JDBC connectors need a specified connection-url. The connections string differs and the supported parameters differs for each connector and depends on the use JDBC driver. The driver is typically included. Documentation for each driver is available with the supplier of the driver. For example, the PostgreSQL connector uses the PostgreSQL JDBC driver and you can learn details from the PostgreSQL JDBC driver documentation.

Typically, username and password credentials are also required and are specified with the separate properties, if desired as secrets.

Performance#

The JDBC connector base includes a number of features to improve connector performance. These features need to be extended in the connector to be available as user facing features.

Table statistics#

Connectors support table and column statistics. These are used for cost based optimizations to improve query processing performance based on the actual data in the data source.

Collection of statistics varies for each data source.

Table and column statistics are enabled by default. You can disable statistics for a catalog by setting the statistics.enabled configuration property to false. On JDBC-based connectors, you can use the metadata.cache-ttl and metadata.cache-missing configuration properties to further control statistics caching.

The statistics can be viewed in SEP using SHOW STATS.

Dynamic filtering#

Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.

You can disable dynamic filtering by setting the property dynamic-filtering.enabled in your catalog properties file to false.

Wait timeout#

By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.

You can configure the dynamic-filtering.wait-timeout property in your catalog properties file:

dynamic-filtering.wait-timeout=1m

You can use the dynamic_filtering_wait_timeout catalog session property in a specific session:

SET SESSION mycatalog.dynamic_filtering_wait_timeout = 1s;

Compaction#

The maximum size of dynamic filter predicate, that is pushed down to the connector during table scan for a column, is configured using the domain-compaction-threshold property in the catalog properties file:

domain-compaction-threshold=100

You can use the domain_compaction_threshold catalog session property:

SET SESSION domain_compaction_threshold = 10;

By default, domain-compaction-threshold is set to 32. When the dynamic predicate for a column exceeds this threshold, it is compacted into a single range predicate.

For example, if the dynamic filter collected for a date column dt on the fact table selects more than 32 days, the filtering condition is simplified from dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30') to dt BETWEEN '2020-01-10' AND '2020-05-30'. Using a large threshold can result in increased table scan overhead due to a large IN list getting pushed down to the data source.

Metrics#

Metrics about dynamic filtering are reported in a JMX table for each specific catalog mycatalog using any connector:

jmx.current."com.starburstdata.presto.plugin.jdbc.dynamicfiltering:name=mycatalog,type=dynamicfilteringstats"

For the Snowflake connector, the JMX table also includes the name:

jmx.current."com.starburstdata.presto.plugin.jdbc.dynamicfiltering:catalog=snowflake,name=snowflake,type=dynamicfilteringstats"

Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of awaitable dynamic filters and the time spent waiting for dynamic filters.

JDBC connection pooling#

JDBC connection pooling is disabled by default. When enabled each worker creates and maintains a connection pool instead of opening and closing separate connections to the data source sequentially. Each connection is available to connect to the data source and retrieve data. After completion of the operation, the connection is returned to the pool and can be reused. This improves performance by a small amount, and reduces the load on any required authentication system used for establishing the connection.

You can enable JDBC connection pooling by setting connection-pool.enabled=true in your catalog properties file.

Additional configuration properties to tune the connection pooling are listed below with their default value.

connection-pool.enabled=true
connection-pool.max-size=10
connection-pool.max-connection-lifetime=30m
connection-pool.pool-cache-ttl=30m
connection-pool.pool-cache-max-size=1000

connection-pool.max-size controls the maximum number of idle and active connections in the pool.

connection-pool.max-connection-lifetime controls the maximum lifetime of a connection. When a connection reaches the timeout, even if recently used, it is removed.

connection-pool.pool-cache-max-size controls the maximum size of the JDBC data source cache.

connection-pool.pool-cache-ttl controls the expiration of a cached data source when they are no longer accessed.