Data products#

Note

The data products feature is experimental. It is for testing purposes only, not suitable for production usage. Please contact Starburst support with questions or feedback.

Data products are available in Insights. The Data products tab allows you to publish, find, and manage curated data assets in your organization.

A data product is a schema. It can have one or more datasets, which are views. Data products are not limited in the number of datasets that they can contain.

Data products summary tiles are displayed on the Data products tab along with a brief description and their status:

  • Pending changes

  • Draft

  • [No status] - this data product is published and ready for use

You must develop, test and vet your data product outside of Starburst in the usual manner before publishing it as a data product. Once your data is ready, re-create the schema and tables in the Data products UI as a data product with datasets. You cannot register pre-existing data schemas with views as data products.

Requirements#

Data products require the following:

  • Catalog using the Hive connector with a Hive metastore to manage views.

  • A user with appropriate access privileges to create schemas and datasets if your organization uses data governance.

If you are unsure whether these requirements have been met, ask your data engineer.

Configuration#

The data products feature is disabled by default.

If you are using Starburst Enterprise platform (SEP) with data governance, you must specify a user and password with sufficient privileges that is available through your configured authentication type, such as OAuth2 or LDAP.

Configuration properties for data products are set in the config.properties file. In Kubernetes, set this in the additionalProperties node on the coordinator:

data-product.starburst-jdbc-url=jdbc:trino://coordinator.example.com?SSL=true
data-product.starburst-user=alice
data-product.starburst-password=my123password
Mapping configuration properties#

Property name

Description

Default

starburst.data-product.enabled

Enable or disable data products.

false

data-product.starburst-jdbc-url

Required JDBC URL connection string of the cluster that data products is running on. Use JDBC driver parameters to configure details of the connection. For example, append SSL=true for a cluster secured with TLS, or other parameters as needed for authentication, client identification, or client tags.

data-product.starburst-user

Specific user that is used to create schemas and datasets for all catalogs that contain data products. The specified user must have the following permissions:

  • Read and write access to object storage and metastore

  • Read all source tables used to create datasets

  • Read access to the system catalog

Source tables can be located in all configured and available catalogs and schemas.

data-product.starburst-password

Password for the user specified by data-product.starburst-user.

data-product.publishing-threads-count

Number of threads in the coordinator pool allocated to publishing data products. We strongly recommend using the default value unless you observe slowness in concurrent publishing jobs.

2

Data governance for data products#

As with any view creation, a new object is created. You must create grants and restrictions for it as necessary.

The view is created and stored in the Hive metastore with the permissions and ownership of the data-product.starburst-user. It is important to review all grants and restrictions for this user, particularly for any PII (Personally Identifying Information), as views do not inherit grants and restrictions on the source data.

The user creating a dataset in the UI may have a different set of roles and grants than data-product.starburst-user. A user querying a data set in a data product may have still another different set of roles and grants.

If the set of roles and grants on source data is more restrictive for data product owners and data product users, they are still able to view all data that they would have been restricted from in the source data until data governance is applied to the view. You must apply roles and grants for the new view as with any other view.

Warning

Data product owners creating datasets always have access to a preview of data with the restrictions and grants applicable to the specified data-product.starburst-user, which may be less restrictive than their own. You must create data governance policies for the views that represent the dataset.

Dashboard#

Data products are sortable, searchable, and filterable in the Dashboard pane. The filters work together to further refine search results.

Sort data products#

Data products can be sorted by the following:

  • A-Z or Z-A by title

  • Descending or ascending by creation date

  • Alphabetical by status

Data domains filter#

Your organization has many data domains, represented by people organized around a common business purpose, such as sales, marketing, or user engagement. Data products creators can assign data products to a domain. Use the Filter by domain drop down to filter the displayed data products for a particular domain.

Tags filter#

You can assign one or more tags to the data products that they create. Tags can be new or existing. Use the Tags drop down to select one or more tags to filter the displayed data products.

Search for data products#

The search field allows you to search on a string. Title and description text are searched.

View or edit a data product#

In the Data products home screen, click on a tile to view a data product in the Data product details screen.

Overview pane#

In the Overview pane, you can browse the following information about a data product:

  • Catalog

  • Summary

  • Description

  • List of datasets

Each item in the list of datasets can expand and collapse. The expanded view shows a summary of the columns in the dataset, including:

  • A short description

  • View name to be used in queries

  • Date last updated

  • Column information

For each column, the following information is provided:

  • Column name

  • Data type

  • Column description (if available)

Usage examples pane#

The Usage examples pane allows you to view previously provided SQL usage examples for a data product, as well as to add new ones.

Each example usage contains a short explanatory text as well as the SELECT statement.

The SELECT statements can be copied for use in the Query data pane using the copy icon in the text box.

To create a new usage example, click the + usage example button, enter a short descriptive text and the SELECT statement. Click Save.

Query data pane#

The Query data pane contains the following:

  • Data browser

  • Query editor

  • Query status

  • Query results

To query data, type or paste in a SELECT statement and click Run.

You can use this query to create a new data product, or as a new dataset in an existing data product. To do so, click the + Data product button. You must select either the “Create a new data product” or “Add to an existing data product” option in the resulting dialog.

If you choose to add to an existing data product, a drop down appears. Select the existing data product to use for the new dataset and click Continue.

For either selection, enter the selected data publishing workflow.

Edit data products#

Click the Edit button for a data product to enter its Update data product screen. This workflow is similar to the workflow for creating a data product, except not all fields are editable for everyone.

Warning

You do not have to be the owner of a data product to edit it. Use extreme caution and courtesy when editing data products you do not own. Data is often subject to SLAs and modifying it unnecessarily may adversely affect business goals.

Delete a data product#

You can delete a data product from the Overview pane using the Delete button.

Warning

You do not have to be the owner of a data product to delete it. Use extreme caution and courtesy when deleting data products you do not own. Data is often subject to SLAs and deleting it unnecessarily may adversely affect business goals.

Publish data products#

Click on the Publish data button in the Data products tab to begin creating a data product in the Create data product screen.

Describe your data product#

Enter the required title and description information:

  • Data product title - This is used as the schema name. It is used to generate the schema name used to query the data. This field is limited to 50 characters.

  • Data product description - This unlimited text field allows you to provide a detailed description. Include pertinent information such as grain, intended use, methodology to help data product users.

Click Save as draft to stop the workflow at this point and mark the data product’s display tile as “Draft”. Click Save and continue to go to the the Define datasets screen.

Define datasets#

In the Define datasets screen, select the catalog in which to store the schema using the Catalog dropdown. Catalogs using the Hive connector are supported.

Give the dataset a name in the Published dataset name field. It is used to define the name of the view that represent the data set. You can access the dataset with this view name in your SQL queries in the configured catalog and schema of the Data product. Add a short text description of the data to help users understand the contents of the dataset.

Once you have selected the appropriate dataset type, add the SELECT statement to the Create dataset from query text box that creates the dataset and click Done.

Click Show columns. A list with the columns created by the SELECT statement appears.

Enter column descriptions as desired to assist your users in better understanding the data.

If your data product has only one dataset, you are done. If you need to add more datasets, click Add another dataset. When you are done adding datasets, click Save and continue. The Data product details screen appears.

Add details#

The Data product details screen allows you to add the following information to a data product:

  • Data owner - Including name and email. Multiple owners can be assigned.

  • Domain (required)- The domain where the data resides.

  • Tags - Create a new tag or select from existing tags.

  • Links - Add one or more links relevant to the data. Includes link label text as well as the URL.

When all desired details are added, click Proceed and publish.

Domain management#

The Domain management pane allows you to create, edit and delete data domains that define your business. Domains are presented in a list view in alphabetical order. You can sort them in ascending or descending order using the arrow next to the Domain name column header.

Create a domain#

Click the Create new domain button above the list of domains. In the dialog that appears, enter a domain name and optionally, a description for the domain. Click Create domain. The new domain will appear in the list.

Edit or delete a domain#

Click the pencil icon for the domain you want to edit or delete. The domain’s editable information appears.

You can change the Domain name. It will automatically be changed for any data product assigned to it.

You can add or change the description in the Domain description field.

All data products must be assigned to a domain. You can reassign a data product to a domain either in its edit workflow, or by editing the domain. The edit screen for a domain contains a list of all currently assigned data products. You can reassign a data product to a different domain by clicking its Reassign link and selecting a new domain from the dropdown.

To reassign all data products to a different domain, click Reassign all and select a new domain to apply to all currently-assigned data products.

To delete a domain, you must first reassign all data products to a different domain. When no data products are assigned to a domain, a Delete domain link appears in the domain’s edit screen. Click the link to delete the domain.