Unity catalog with Delta Lake#
The Delta Lake connector supports a subset of operations across managed, external, and Databricks Unity Catalog-owned tables when using the Databricks Unity Catalog as a metastore on AWS, Azure, or Google Cloud. The following table outlines which operations are supported.
Supported for |
Operation |
Notes |
---|---|---|
External tables |
CREATE TABLE, INSERT, UPDATE, MERGE, DELETE, DROP TABLE, READ |
|
Unity Catalog-owned tables |
INSERT, UPDATE, MERGE, DELETE, DROP, READ |
A Unity Catalog-owned table is a managed table specific to Unity. These
tables are created in Databricks and require the
|
Managed tables |
READ-only |
Managed tables that are not Unity Catalog-owned tables are read only. |
Configuration#
To use Unity Catalog metastore, add the following configuration properties to your catalog configuration file:
delta.security=unity
hive.metastore.unity.host=host
hive.metastore.unity.token=token
hive.metastore.unity.catalog-name=main
The following table shows the configuration properties used to connect SEP to Unity Catalog as a metastore.
Property name |
Description |
---|---|
|
Name of the host without http(s) prefix. For example:
|
|
The personal access token used to authenticate a connection to the Unity Catalog metastore. For more information about generating access tokens, see the Databricks documentation. |
|
Name of the catalog in Databricks. |
|
Enables support for Databricks Unity Catalog-owned tables. |
Unity Catalog-owned tables#
Note
Write support for Unity Catalog-owned tables is currently in private preview and considered experimental by Databricks. While SEP provides limited support for this functionality, it relies on specific table configurations that may change without notice.
When using Unity Catalog with managed
tables, add
hive.metastore.unity.catalog-owned-table-enabled=true
to your catalog
configuration file. This property enables SEP to recognize and write to
catalog-owned tables within Unity Catalog.
Then add the following table properties on the Delta Lake table in Databricks to ensure compatibility with SEP:
CREATE TABLE catalog_name.schema_name.table_name (id int
) USING delta
TBLPROPERTIES (
'delta.feature.catalogOwned-preview' = 'supported',
'delta.enableRowTracking' = 'false',
'delta.checkpointPolicy' = 'classic'
)
Enable OAuth 2.0 token passthrough#
The Unity Catalog supports OAuth 2.0 token pass-through.
To enable OAuth 2.0 token passthrough:
Add the following configuration properties to the
config.properties
file on the coordinator:http-server.authentication.type=DELEGATED-OAUTH2 web-ui.authentication.type=DELEGATED-OAUTH2 http-server.authentication.oauth2.scopes=<AzureDatabricks-ApplicationID>/.default,openid http-server.authentication.oauth2.additional-audiences=<AzureDatabricks-ApplicationID>
Replace <AzureDatabricks-ApplicationID>
with the Application ID for your Azure
Databricks Microsoft Application which can be found in your Azure Portal under
Enterprise applications.
Add only the following configuration properties to the
delta.properties
catalog configuration file:delta.metastore.unity.authentication-type=OAUTH2_PASSTHROUGH delta.security=unity hive.metastore-cache-ttl=0s
Limitations:
Credential passthrough is only supported with Azure Databricks and when Microsoft Entra is the IdP.
When enabling credential passthrough you cannot use Hive Passthrough.
Location alias mapping#
If you are using Unity catalog as a metastore when accessing external tables, the Starburst Delta Lake connector supports using a bucket-style alias for your Amazon S3 bucket access point.
To enable location alias mapping:
Create a bucket alias mapping file in JSON format:
{
"bucket_name_1": "bucket_alias_1",
"bucket_name_2": "bucket_alias_2"
}
Add the following properties to your catalog configuration:
location-alias.provider-type=file
location-alias.mapping.file.path=/path_to_bucket_alias_mapping_file
Optionally, use
location-alias.mapping.file.expiration-time
to specify the interval at which SEP rereads the bucket alias mapping file. The default is1m
.
SEP uses the new external location path specified in the bucket alias mapping file to access the data. Only the bucket name is replaced. The URI is otherwise unchanged.