Data products overview#

Data products are available in the Starburst Enterprise web UI. The Data products tab allows you to publish, find, and manage curated data assets in your organization. Use data products as a semantic layer to abstract the technical complexity of your underlying data sources.

A data product is a schema. It can have one or more datasets, which are views or materialized views (if supported by the catalog). Data products are not limited in the number of datasets that they can contain.

Data products summary tiles are displayed on the Data products tab along with a brief description and their status:

  • Draft - this data product was never published to SEP

  • Pending changes - this data product was already published to SEP but it has changes in its datasets definitions that were not yet published

  • [No status] - this data product is published to SEP and ready for use

You can use the query editor to develop, test, and vet your data product.

Note

You cannot directly register pre-existing data as data products. Instead, create a view for that data using a simple SELECT * statement.

Requirements#

Data products require the following:

  • A catalog that uses either

    • The Hive connector with the Hive Metastore Service (HMS), AWS Glue, or Starburst data catalog. Optionally, enable materialized views.

    • The Iceberg connector with any supported metastore (to use materialized views, you must configure an Iceberg catalog with a metastore that supports materialized views)

  • A SEP user with appropriate access privileges to impersonate other users

  • A configured and operational backend service.

  • The following property in your coordinator’s config.properties file:

    insights.persistence-enabled=true
    

If you are unsure whether these requirements have been met, ask your data engineer.

Access control for data products#

The supported access control systems depend on your catalog type:

  • Hive catalogs: Use either built-in access control or Apache Ranger.

  • Iceberg catalogs: Use built-in access control. In your BIAC configuration, you must:

    • Enable ownership for the catalog.

    • Ensure that the user that creates the data product has only a single role enabled in the Starburst Enterprise web UI.

    • Ensure that the role has REFRESH privileges for the Iceberg catalog that contains the data product.

Note

Support for data products with Iceberg is a public preview in Starburst Enterprise. Contact Starburst Support with questions or feedback.

For more information, see the built-in access control and Ranger documentation.

Views and materialized views#

Create data products as either views or materialized views.

To use materialized views with Hive data products, see the Hive connector documentation.

To use materialized views with Iceberg data products, you must configure an Iceberg catalog with a metastore that supports materialized views, such as HMS, Thrift, Glue, or the Starburst embedded catalog. JDBC, Nessie, REST, and Snowflake catalogs do not support materialized views.

Configuration properties#

When you create a data product as a materialized view, the following properties are available:

Materialized view data product configuration properties#

Property name

Description

Default

refresh_schedule

Cron expression that specifies the schedule for refreshing the materialized view

refresh_schedule_timezone

The timezone for evaluating the refresh schedule cron expression

UTC

storage_schema

The schema where the materialized view storage table is created

Access control for underlying data#

Data product privileges do not imply any table privileges for the underlying data objects defined by its datasets. This means a user can be allowed to see a data product in the data products dashboard while not being allowed to query its datasets and vice-versa.

Make sure you define appropriate access control privileges for the data objects defined by your data products. While data product privileges can only be managed by built-in access control and Apache Ranger, the privileges for the data objects it creates in SEP (schemas, views, and materialized views) can be managed by other supported access control systems.

Note

Data product views are created and stored in the Hive metastore with the permissions and ownership of the user who published the data product.

API#

Data products, domains and associated tags can be managed programmatically with the Starburst Enterprise REST API.