Manage data products#
Manage your data products in two stages:
First create a data product and define its datasets. Changes made in this stage do not affect the data sources. The data product’s configuration is stored in the backend service database.
Publish your configured and tested data product. This creates the specified schema and views for the data product.
Click Publish data in the Data products dashboard to begin creating a data product, or click Edit from the Overview tab of a data product’s details screen to update an existing data product.
Note
Some fields marked required may be bypassed for draft data products. They are required for data products to be published.
Create a data product#
When you create or edit a data product, there are three screens to enter information:
Define data product
Define datasets
Data product details
Define data product#
Enter or edit the following information:
Data product title (required) - Enter a descriptive title. This is used for display purposes and to generate the schema name used to query the data. Make sure this name does not correspond to any existing schema in the data product’s catalog. This field is limited to 40 alphanumeric characters plus underline. This field cannot be changed once the data product is published.
Catalog (required) - Use the drop-down list to select the catalog in which to store the schema. Only catalogs using the Hive connector are supported. This field cannot be changed once the data product is created.
Schema name - The value of this field is generated, based on the title. It cannot be edited.
Domain (required) - Use the drop-down list to assign a product to a domain.
Data product summary (required) - Enter a brief summary of 150 characters or less to be displayed in the list or grid view of the data products dashboard.
Data product description - This unlimited text field allows you to provide a detailed description. Include pertinent information to help users of your data product, such as its data granularity, intended use, and methodology.
When you are done adding the information, do one of the following:
Click Save and continue to proceed to the next screen. The Define datasets screen appears.
Click Save as draft. Your changes are marked
Draft
and you are taken to the data products dashboard.
Define datasets#
The Define datasets screen contains tabs for each dataset across the top of the screen. Enter or edit the following information for your datasets:
Published dataset name (required) - Enter a descriptive name. This field is limited to 50 alphanumeric characters plus underline.
Dataset description - This text field allows you to provide a detailed description. Include pertinent information such as grain, intended use, and how it relates to other datasets in the data product. Markdown is supported in this field.
Dataset type (required) - You must select a type for your dataset, either View or Materialized view.
Selecting View creates a view. This stores the SQL definition and executes the query whenever the data product is accessed.
Selecting Materialized view creates a materialized view of the data. This creates an actual storage table with the data, and provides better query performance. You must provide a refresh interval in minutes.
Query (required) - Enter the query that defines your dataset.
Note
The Materialized view dataset type is only available for Hive catalogs with materialized views enabled.
Once you have provided the required information, you must click Show columns
and add column descriptions. SEP runs the query and loads a list with the
columns created by the SELECT
statement.
Enter column descriptions to assist your users to better understand the data. This step is optional for each column but recommended.
Click Preview to open a pop-up window with an example result set of your query, limited to ten rows.
If your data product has only one dataset, you are done. If you need to add more datasets, click Add another dataset. When you are done adding datasets, do one of the following:
Click Save and continue to proceed to the next screen. The Data product details screen appears.
Click Save as draft. Your changes are marked as
Draft
and you are taken to the data products dashboard.
Note
When viewing a dataset, you can clone a dataset by clicking the clone (stacked paper) icon next to the dataset’s name. You will be prompted to enter a new named for the cloned dataset. Cloning is not available in Edit mode.
Data product details#
The Data product details screen allows you to add the following information to a data product:
Owner (required) - Including name and email. Multiple owners can be assigned.
Tags - Create a new tag or select from existing tags.
Links - Add one or more links relevant to the data. Includes link label text as well as the URL.
When all desired details are added, do one of the following:
Click Save and review to proceed to the next screen. The overview screen for the data product appears.
Click Save as draft. Your changes are marked
Draft
and you are taken to the data products dashboard.
In the Datasets section or the Usage examples tab, open a new query editor tab by clicking the query icon < > next to the name of the dataset you want to query.
Publish a data product#
When you finish creating or editing a data product, the Data product details pane shows a Publish button. The data product’s schema and views do not exist until you publish the product.
The following information is required in order to publish a data product:
Name
Catalog
Domain
Summary
At least one dataset with name and query
At least one data product owner
Click Publish to create the data product, or to update an existing data product. This has the following effects:
Creates the data product as a schema in the data product’s catalog if it does not yet exist. The schema is created in the domain’s default location if provided, or in the catalog’s configured default location.
Creates its defined datasets as views or materialized views in that schema.
The data product’s status transitions to Published, and is no longer a Draft.
Edit or remove a published data product#
If you edit a published data product’s definition, such as adding or removing datasets, or editing a dataset’s query, the changes are not automatically reflected in the datasets. Instead, the data product transitions to the Pending changes status. The changes are only synced with the data sources when you click Publish again.
You can change the type of a draft dataset. To change the type of a published dataset you must publish the data product again, until then the existing dataset is marked for the deletion and the dataset with the new type is a draft.
If you delete a published data product, this also deletes its schema and views or materialized views from the catalog.
Data product security#
The views that implement a dataset are created with SECURITY DEFINER mode.
The logged-in username that publishes the data product becomes the view owner in the catalog.
If built-in access control or another access control system is enabled, to publish a data product, the view owner’s role must have the following privileges, or be a member of a group role granted these privileges:
Object secured |
Privilege |
Notes |
---|---|---|
Data products |
|
For the data product’s domain or for a single data product. |
Queries |
|
|
Tables |
|
For all tables in the dataset definition. Must specify the Allow role receiving grant to grant to others option. |
Tables |
|
In the data product’s catalog, to allow creating schemas and views. |
The following shows additional privileges that must be granted to the role of the user or group to allow data products management tasks:
Object secured |
Privilege |
Notes |
---|---|---|
Data products |
|
To allow creating a domain. |
Data products |
|
To allow editing data products in a domain or to allow editing a domain. |
Data products |
|
To allow deleting a data product or a domain. |
Tables |
|
To allow refreshing materialized views, if those are used in the dataset. |
You specify a data product user with the
data-product.starburst-user
property in the initial data product configuration. This data product user impersonates the
logged-in user when it executes data product operations on the data source, such
as creating schemas and views.
When impersonating, all roles of the impersonated user are enabled, other than
the sysadmin
role. When publishing a data product as sysadmin
, make sure
the privileges listed in this table are granted to a role other than
sysadmin
.
SEP’s content security policy (CSP)
prevents the rendering of external images by default. To allow images, you must
configure the http-server.content-security-policy
. Include all values to
prevent overriding the CSP, as in the following examples:
This is the default CSP value:
http-server.content-security-policy=default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline' fonts.googleapis.com; img-src 'self' data:; font-src 'self' fonts.gstatic.com data:; frame-ancestors 'self';
Allow images from a specified domain. The following example allows images from upload.wikimedia.org:
http-server.content-security-policy=default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline' fonts.googleapis.com; img-src 'self' upload.wikimedia.org data:; font-src 'self' fonts.gstatic.com data:; frame-ancestors 'self';
Separate multiple domains with a space. The following example allows images from upload.wikimedia.org and upload.different.org:
http-server.content-security-policy=default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline' fonts.googleapis.com; img-src 'self' http://upload.wikimedia.org http://upload.different.org data:; font-src 'self' fonts.gstatic.com data:; frame-ancestors 'self';
You can use *
to allow all domains:
http-server.content-security-policy=default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline' fonts.googleapis.com; img-src * data:; font-src 'self' fonts.gstatic.com data:; frame-ancestors 'self';