Data products overview#

Data products are available in the Starburst Enterprise web UI. The Data products tab allows you to publish, find, and manage curated data assets in your organization. Use data products as a semantic layer to abstract the technical complexity of your underlying data sources.

A data product is a schema. It can have one or more datasets, which are views or materialized views (if supported by the catalog). Data products are not limited in the number of datasets that they can contain.

Data products summary tiles are displayed on the Data products tab along with a brief description and their status:

  • Draft - this data product was never published to SEP

  • Pending changes - this data product was already published to SEP but it has changes in its datasets definitions that were not yet published

  • [No status] - this data product is published to SEP and ready for use

You can use the query editor to develop, test, and vet your data product. Be sure to mark your data product before publishing it into production as a data product.

Note

You cannot directly register pre-existing data as data products. Instead, create a view for that data using a simple SELECT * statement.

Requirements#

Data products require the following:

  • A catalog using the Hive connector with a Hive metastore to manage views. You can optionally enable materialized views support for the catalog to provide performance benefits for data product consumers.

  • A SEP user with appropriate access privileges to impersonate other users

  • A configured and operational backend service.

  • The following property in your coordinator’s config.properties file:

    insights.persistence-enabled=true
    

If you are unsure whether these requirements have been met, ask your data engineer.

Access control for data products#

If a role-based access control system like built-in access control or Apache Ranger is enabled, you must set roles and privileges for data products. You can control which users are able to see, edit, create, delete, and publish data products. You can set privileges that apply globally (across all domains), privileges that apply to a specific domain, and privileges that apply to a specific data product. For details, see the built-in access control and Ranger documentation.

Note

With access control enabled, only users that you explicitly allow are able to see existing data products.

Access control for data products underlying data#

Data product privileges do not imply any table privileges for the underlying data objects defined by its datasets. This means a user can be allowed to see a data product in the data products dashboard while not being allowed to query its datasets and vice-versa.

Make sure you define appropriate access control privileges for the data objects defined by your data products. While data product privileges can only be managed by built-in access control and Apache Ranger, the privileges for the data objects it creates in SEP (schemas, views, and materialized views) can be managed by other supported access control systems.

Note

Data product views are created and stored in the Hive metastore with the permissions and ownership of the user who published the data product.

API#

Data products, domains and associated tags can be managed programmatically with the Starburst Enterprise REST API.