Share data products#
Data product sharing lets users publish a data product from one cluster and make it available to users on other clusters. This feature enables read-only access to datasets across clusters without duplicating data workflows or redefining data products. Sharing is supported for both Hive and Iceberg data products.
When a data product is shared, it is owned and managed by a publisher cluster and consumed by a subscriber cluster:
The publisher is the cluster where the data product is created, refreshed, and managed.
The subscriber is a cluster that consumes the shared data product.
The publisher is the authoritative source of the data product. Any changes made on the publisher cluster are automatically reflected on all subscriber clusters. Subscribers cannot modify the shared data product’s definition, refresh behavior, or underlying data. Data product sharing is implemented as a one-way replication from the publisher to the subscriber. On the subscriber cluster, the data product is materialized as storage tables with a logical view on top.
When the publisher refreshes the data product, updated data becomes available to subscribers. Each subscriber retrieves updates according to its configured refresh schedule. During refresh, the subscriber creates a new version of the storage tables and atomically updates the view.
Note
The data product sharing feature is available as a public preview in Starburst Enterprise. Contact your Starburst account team with questions or feedback.
Prerequisites#
To use and configure data product sharing, consider the following:
Secrets manager: All credentials should be stored using a secrets manager. Keys or secrets cannot be entered as plaintext in the configuration and should be provided by secret reference. This is required for subscriber clusters and optional for publisher clusters.
BIAC: SEP’s built-in access control must be configured on both the publisher and subscriber clusters. The user configuring data product sharing must have the sysadmin role.
Job scheduler: This is enabled automatically when BIAC is configured. Verify that
starburst.jobs.enabledis set totrueon at least the subscriber cluster.Spooling protocol: The spooling protocol uses an object storage location to store data for retrieval by remote clusters. Configure spooling protocol on the publisher cluster.
Dynamic catalogs: Dynamics catalog management lets you define and manage catalogs directly using SQL, instead of manually updating catalog configuration files. Configure dynamic catalog management on the subscriber cluster. The publisher cluster can use static catalogs.
When subscribing to a shared data product, the subscribing user must have permission to create schemas in the target catalog if a new schema name is provided. Otherwise, subscription fails.
The role used to connect to the publisher cluster must exist on the publisher cluster and be assigned to the subscribing user.
Configuration#
Data product sharing must be enabled on both the publisher and subscriber clusters. Add the following property to your coordinator node configuration files on both the publisher and subscriber clusters, and restart the clusters to enable data product sharing.
starburst.data-product-sharing.enabled=true
When set to true, the cluster can share and subscribe to data products. By
default, this feature is disabled.
On the subscriber cluster’s coordinator node, configure the user and role that own all subscribed datasets:
data-product.sharing.dataset.owner-username=<OWNER_USERNAME>
data-product.sharing.dataset.owner-role=<OWNER_ROLE>
Replace OWNER_USERNAME with the username assigned as owner of all subscribed
datasets on the subscriber cluster. Replace OWNER_ROLE with the role assigned
as OWNER_USERNAME. The subscribing user must have permission to create schemas
in the target catalog.
General configuration properties#
The following table contains general configuration properties for the data product sharing. Add relevant properties to the proper configuration file.
Property name |
Description |
Default |
|---|---|---|
|
Enables data product sharing on the cluster. When set to |
|
|
The username assigned as the owner of all subscribed datasets. Any required
privileges are granted to this user. Additionally, this user is the definer
of local views for all subscribed datasets. This property is required and
must have a set role as defined with
|
|
|
The role that is assigned to the owner user as defined with
|
|
|
The maximum number of threads used to clean up unsubscribed or expired storage tables. This property is optional. |
|
|
The frequency at which cleanup tasks remove obsolete storage tables. This property is optional. |
|
|
The wait period before permanently removing replaced storage tables. This property is optional. |
|
|
The maximum refresh attempts before marking a subscribed dataset refresh as failed. This property is optional. |
|
|
The minimum frequency at which refresh tasks for subscribed datasets are scheduled. This property is optional. |
|
|
The duration cached metadata remains valid before refreshing from the publisher cluster. This property is optional. |
|
|
Specifies the delay between retry attempts when metadata synchronization fails. |
|
|
The maximum number of metadata synchronization retry attempts before giving up. This property optional. |
|
|
The maximum number of threads used to synchronize metadata concurrently. This property is optional. |
|
Share data products#
Use the following steps to share a data product from a publisher cluster so that other clusters can subscribe to it.
Create a shared domain#
Follow the documentation to create a domain, and select the Is shared domain checkbox. Only data products in shared domains can be subscribed to by other clusters. Data products in private domains are not available to subscribers. Once a domain is shared, it cannot be made private. Shared data products can be moved across shared and private domains.
Create a subscriber role#
Create a role for the subscriber that grants access to the publisher cluster. In data product sharing, the subscriber uses a role that is defined on the publisher cluster. This role defines what the subscriber can access on the publisher cluster.
From the SEP navigation menu, select Access control > Roles and privileges.
Click Create role.
In the Add new role dialog, specify the Role name. Optionally, you can add a role description.
Click Save.
In the Roles and privileges pane, select the newly created role name.
In the Roles pane of the selected role, click Add privileges.
In the What would you like to modify privileges for? section, select the Data products radio button entity type.
Choose the shared domain or all domains, and select the data products you want to share.
Select Allow.
Choose the desired privileges.
Click Save privileges.
Read the BIAC privileges and BIAC roles documentation for more information.
The role used to connect to the publisher cluster must exist on the publisher cluster and be assigned to the subscribing user.
Assign users#
Next, assign users to the created subscriber role. At least one user must be assigned to the role. The subscribing user must have permission to create schemas in the target catalog.
From the Roles and privileges pane, next to the subscriber role, click the more_vertoptions menu and select Assign.
Add the user that the subscriber cluster uses for authentication.
If you are assigning a newly created role, in the Assign to role dialog, select the Entity category and Role.
Click Assign.
Move data products#
To move data products on the publisher cluster to the shared domain, follow the steps in the Domain management documentation.
Subscribe to data products#
To subscribe to a shared data product, make sure the subscribing cluster meets the following requirements:
Data product sharing is enabled and properly configured on both the subscriber and publisher clusters.
The required permissions and roles are configured.
You completed the steps in the share data products section.
Connect to a remote cluster#
Access the Remote clusters pane in the Data product dashboard,
make sure you are signed into the sysadmin role, and follow these steps on the
subscriber cluster:
In the Remote clusters dialog, click Connect to cluster.
Enter the following connection details:
Connection name: Name of the connection used to reference the remote cluster. This does not need to match the name of the actual cluster you want to connect to.
Connection URL: The publisher cluster URL and port information to connect to the remote cluster. For example,
domain.starburst.net:443.In the Remote cluster authentication section:
Username: The user assigned to the subscriber role on the publisher.
Password secret: The secret reference from your secrets manager.
Role: Subscriber role defined on the publisher.
Click Connect.
Subscribe#
Once connected, data products available for subscription appear. To subscribe to a shared data product, follow these steps:
From the data products that are labeled Available to subscribe, click Subscribe on the data product you want to subscribe to.
Enter the following connection details in the Subscribe to dialog:
Catalog: Use the drop-down list to select the catalog in which to store the schema.
Schema name: Enter the name of the schema to create in the selected catalog. The schema cannot already exist, and the user must have permissions to create schemas in the selected catalog.
In the Data product owners section:
Owner name: Enter the username that owns the subscribed schema and datasets.
Owner email: Enter the email associated with the owner.
You can add multiple owners to the subscribed data product.
In the Refresh schedule section:
From the drop-down list, choose your desired Time zone for evaluating the refresh schedule cron expression.
Choose the Select frequency or Enter cron expression recurring interval format.
Optionally, select the Override default storage schema checkbox and choose the specific schema where the storage table is created from the drop-down list.
Click Subscribe and publish.
Configure subscriber access control#
From the SEP navigation menu, select Access control > Roles and privileges to configure access to subscribed data products for non-sysadmin users.
In the Roles and privileges pane, select the Role name you want to update.
Click Add privileges.
Select the Remote data products radio button to specify which privileges to modify.
From the Connections drop-down list, select the remote connection you want to modify.
Use the Domains drop-down list to select the domain.
From the Data products drop-down list, select the data products you want to modify privileges for.
In the Do you want to allow or deny access? section, select the Allow radio button.
Under What can they do?, select the checkboxes of the permissions you want to be available to the subscriber role.
For more information about access control, see the Data products documentation.
Limitations#
The Starburst Enterprise platform (SEP) REST API is not supported.
Schemas on the subscriber are not deleted when the publisher deletes a data product or revokes access.
Starburst Stargate catalogs to remote clusters are dropped on connection termination. As a result, SEP may retain catalogs pointing to non‑existent catalogs on the subscriber.