Unity catalog with Iceberg#

The Iceberg connector supports reading from and writing to external tables and reading from managed tables when using the Databricks Unity Catalog as a metastore on AWS, Azure, or Google Cloud.

Configuration#

To use Unity Catalog metastore, add the following configuration properties to your catalog configuration file:

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=<https://dbc-12345678-9999.cloud.databricks.com/api/2.1/unity-catalog/iceberg-rest>
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.token=<OAUTH_TOKEN>

The Iceberg Unity Catalog integration uses an Iceberg REST catalog. For more information about REST catalog configuration properties, read the documentation.

Server-side scan planning#

When you query an Iceberg table, SEP must determine which data files to read to satisfy the query. This is called scan planning. By default, SEP performs scan planning on the client by reading table metadata directly. Databricks Unity Catalog can instead perform scan planning on the server and return the planning result to SEP. Server-side scan planning is required to enforce Databricks Unity Catalog fine-grained access control, such as column masks and row filters. SEP uses server-side scan planning automatically when a REST catalog advertises it for a table. Otherwise, SEP defaults to client-side scan planning.

Set iceberg.rest-catalog.vended-credentials-enabled to true. Server-side scan planning requires vended credentials to read the files that Databricks Unity Catalog returns during planning. Without them, queries may fail.

Note

Server-side scan planning can significantly increase query latency, because the request to Databricks Unity Catalog’s scan planning API contributes to split generation time. This cost applies only to tables that Databricks Unity Catalog plans on the server.

Verify planning mode#

SEP reports two scan operator metrics, serverSideScanCount and clientSideScanCount, that count how many scans are planned on the server versus the client. Use these metrics to confirm which planning mode a query used.

Limitations#

Server-side scan planning is only supported with Databricks Unity Catalog. Other REST catalogs use client-side planning.
Table statistics are not collected, so the cost-based optimizer cannot perform join reordering for queries against server-side planned tables.
Case-sensitive identifiers are not supported.
Time travel is not supported.
Metadata tables, such as $snapshots, $files, $manifests, $partitions, and $refs, cannot be queried.