Unity catalog with Delta Lake#

The Delta Lake connector supports a subset of operations across managed, external, and Databricks Unity Catalog-owned tables when using the Databricks Unity Catalog as a metastore on AWS, Azure, or Google Cloud. The following table outlines which operations are supported.

Supported operations for Unity Catalog with Delta Lake#

Supported for

Operation

Notes

External tables

CREATE TABLE, INSERT, UPDATE, MERGE, DELETE, DROP TABLE, READ

Unity Catalog-owned tables

INSERT, UPDATE, MERGE, DELETE, DROP, READ

A Unity Catalog-owned table is a managed table specific to Unity. These tables are created in Databricks and require the delta.feature.catalogOwned-preview table feature.

Managed tables

READ-only

Managed tables that are not Unity Catalog-owned tables are read only.

Configuration#

To use Unity Catalog metastore, add the following configuration properties to your catalog configuration file:

delta.security=unity
hive.metastore.unity.host=host
hive.metastore.unity.token=token
hive.metastore.unity.catalog-name=main

The following table shows the configuration properties used to connect SEP to Unity Catalog as a metastore.

Unity configuration properties#

Property name

Description

hive.metastore.unity.host

Name of the host without http(s) prefix. For example: dbc-a1b2345c-d6e7.cloud.databricks.com

hive.metastore.unity.token

The personal access token used to authenticate a connection to the Unity Catalog metastore. For more information about generating access tokens, see the Databricks documentation.

hive.metastore.unity.catalog-name

Name of the catalog in Databricks.

hive.metastore.unity.catalog-owned-table-enabled

Enables support for Databricks Unity Catalog-owned tables.

Unity Catalog-owned tables#

Note

Write support for Unity Catalog-owned tables is currently in private preview and considered experimental by Databricks. While SEP provides limited support for this functionality, it relies on specific table configurations that may change without notice.

When using Unity Catalog with managed tables, add hive.metastore.unity.catalog-owned-table-enabled=true to your catalog configuration file. This property enables SEP to recognize and write to catalog-owned tables within Unity Catalog.

Then add the following table properties on the Delta Lake table in Databricks to ensure compatibility with SEP:

CREATE TABLE catalog_name.schema_name.table_name (id int
) USING delta
TBLPROPERTIES (
  'delta.feature.catalogOwned-preview' = 'supported',
  'delta.enableRowTracking' = 'false',
  'delta.checkpointPolicy' = 'classic'
)

Enable OAuth 2.0 token passthrough#

The Unity Catalog supports OAuth 2.0 token pass-through.

To enable OAuth 2.0 token passthrough:

  1. Add the following configuration properties to the config.properties file on the coordinator:

    http-server.authentication.type=DELEGATED-OAUTH2
    web-ui.authentication.type=DELEGATED-OAUTH2
    http-server.authentication.oauth2.scopes=<AzureDatabricks-ApplicationID>/.default,openid
    http-server.authentication.oauth2.additional-audiences=<AzureDatabricks-ApplicationID>
    

Replace <AzureDatabricks-ApplicationID> with the Application ID for your Azure Databricks Microsoft Application which can be found in your Azure Portal under Enterprise applications.

  1. Add only the following configuration properties to the delta.properties catalog configuration file:

    delta.metastore.unity.authentication-type=OAUTH2_PASSTHROUGH
    delta.security=unity
    hive.metastore-cache-ttl=0s
    

Limitations:

  • Credential passthrough is only supported with Azure Databricks and when Microsoft Entra is the IdP.

  • When enabling credential passthrough you cannot use Hive Passthrough.

Location alias mapping#

If you are using Unity catalog as a metastore when accessing external tables, the Starburst Delta Lake connector supports using a bucket-style alias for your Amazon S3 bucket access point.

To enable location alias mapping:

  1. Create a bucket alias mapping file in JSON format:

{
  "bucket_name_1": "bucket_alias_1",
  "bucket_name_2": "bucket_alias_2"
}
  1. Add the following properties to your catalog configuration:

location-alias.provider-type=file
location-alias.mapping.file.path=/path_to_bucket_alias_mapping_file
  1. Optionally, use location-alias.mapping.file.expiration-time to specify the interval at which SEP rereads the bucket alias mapping file. The default is 1m.

SEP uses the new external location path specified in the bucket alias mapping file to access the data. Only the bucket name is replaced. The URI is otherwise unchanged.