Starburst Delta Lake connector#
The Starburst Delta Lake connector is an extended version of the Delta Lake connector with configuration and usage identical.
Requirements#
To connect to Databricks Delta Lake, you need:
Fulfill the Delta Lake connector requirements.
A valid Starburst Enterprise license.
Extensions#
The connector includes all the functionality described in the Delta Lake connector as well as the features and integrations detailed in the following section:
Unity catalog#
The connector supports reading from managed, internal tables, and unmanaged Delta Lake tables when using the Databricks Unity Catalog as a metastore on AWS or Azure.
Note
The Databricks Unity Catalog metastore is available for Delta Lake as a public preview. Reading from views is not supported when using Databricks Unity Catalog as a metastore. Contact Starburst Support with questions or feedback.
To use Unity Catalog metastore, add the following configuration properties to your catalog configuration file:
hive.metastore=unity
delta.security=read_only
delta.metastore.unity.host=<unity catalog hostname>
delta.metastore.unity.access-token=<token>
The following table shows the configuration properties used to connect SEP to Unity Catalog as a metastore.
Property name |
Description |
---|---|
|
Name of the host, without http(s) prefix, for example:
|
|
The token used to authenticate a connection to the Unity Catalog metastore. For more information about generating access tokens, see the Databricks documentation. |
|
(Optional) Name of the catalog in Databricks. Default is |
Enable OAuth 2.0 token pass-through#
The Unity Catalog supports OAuth 2.0 token pass-through.
To enable OAuth 2.0 token pass-through:
Add the following configuration properties to the
config.properties
file on the coordinator:http-server.authentication.type=DELEGATED-OAUTH2 web-ui.authentication.type=DELEGATED-OAUTH2 http-server.authentication.oauth2.scopes=<AzureDatabricks-ApplicationID>/.default,openid http-server.authentication.oauth2.additional-audiences=<AzureDatabricks-ApplicationID>
Replace <AzureDatabricks-ApplicationID>
with the Application ID for your Azure
Databricks Microsoft Application which can be found in your Azure Portal under
Enterprise applications.
Add only the following configuration properties to the
delta.properties
catalog configuration file:delta.metastore.unity.authentication-type=OAUTH2_PASSTHROUGH delta.security=unity hive.metastore-cache-ttl=0s
Limitations:
Credential passthrough is only supported with Azure Databricks and when Microsoft Entra is the IdP.
When enabling credential passthrough you cannot use Hive Passthrough.
Location alias mapping#
If you are using Unity catalog as a metastore when accessing external tables, the Starburst Delta Lake connector supports using a bucket-style alias for your Amazon S3 bucket access point.
To enable location alias mapping:
Create a bucket alias mapping file in JSON format:
{
"bucket_name_1": "bucket_alias_1",
"bucket_name_2": "bucket_alias_2"
}
Add the following properties to your catalog configuration:
location-alias.provider-type=file
location-alias.mapping.file.path=/path_to_bucket_alias_mapping_file
Optionally, use
location-alias.mapping.file.expiration-time
to specify the interval at which SEP rereads the bucket alias mapping file. The default is1m
.
SEP uses the new external location path specified in the bucket alias mapping file to access the data. Only the bucket name is replaced. The URI is otherwise unchanged.
SQL support#
The connector supports all of the SQL statements listed in the Delta Lake connector documentation.
The following improvements are included:
Security operations, see also SQL security
Table replacement, see Replacing tables
SQL security#
You must set the delta.security
property in your catalog properties file to
sql-standard
in order to use SQL security operation statements. See SQL standard based authorization for
more information.
Replacing tables#
The connector supports replacing a table as an atomic operation. Atomic table replacement creates a new snapshot with the new table definition (see CREATE TABLE and CREATE TABLE AS), but keeps table history.
The new table after replacement is completely new and separate from the old table. Only the name of the table remains identical.
For example a partitioned table my_table
can be replaced by a completely new
definition.
CREATE TABLE my_table (
a BIGINT,
b DATE,
c BIGINT)
WITH (partitioned_by = ARRAY['a']);
CREATE OR REPLACE TABLE my_table
WITH (sorted_by = ARRAY['a'])
AS SELECT * from another_table;
Table replacement in the Starburst Delta Lake connector has the following limitations:
Table replacement does not work on append-only Delta Lake tables.
Table replacement does not work for tables with the
change_data_feed_enabled
property set totrue
.Table replacement does not work if the new table after replacement has the
change_data_feed_enabled
property set totrue
.Table replacement does not work if the location specified in the property is different from the location of the existing table.
Table types must stay the same. For example, table replacement cannot be used to replace a managed table with an external table.
Performance#
The connector includes a number of performance improvements, detailed in the following sections:
Dynamic row filtering#
Dynamic filtering, and specifically also dynamic row filtering, is enabled by default. Row filtering improves the effectiveness of dynamic filtering for a connector by using dynamic filters to remove unnecessary rows during a table scan. It is especially powerful for selective filters on columns that are not used for partitioning, bucketing, or when the values do not appear in any clustered order naturally.
As a result the amount of data read from storage and transferred across the network is further reduced. You get access to higher query performance and a reduced cost.
You can use the following properties to configure dynamic row filtering:
Property name |
Description |
---|---|
|
Toggle dynamic row filtering. Defaults to |
|
Control the threshold for the fraction of the selected rows from the
overall table above which dynamic row filters are not used. Defaults to
0.7. Catalog session property name is
|
|
Duration to wait for completion of dynamic row filtering. Defaults to 0.
The default causes query processing to proceed without waiting for the
dynamic row filter, it is collected asynchronously and used as soon as
it becomes available. Catalog session property name is
|
Starburst Cached Views#
The connector supports table scan redirection to improve performance and reduce load on the data source.
Security#
The connector includes a number of security-related features, detailed in the following sections.
Built-in access control#
If you have enabled built-in access control for SEP, you must add the following configuration to all Delta Lake catalogs:
delta.security=starburst