Manage data products#
Manage your data products in two stages:
First create a data product and define its datasets. Changes made in this stage do not affect the data sources. The data product’s configuration is stored in the backend service database. You can also import or export a data product using Data Products as Code, available as a private preview.
Publish your configured and tested data product. This creates the specified schema and views for the data product.
Click Publish data in the Data products dashboard to begin creating a data product, or click Edit from the Overview tab of a data product’s details screen to update an existing data product.
Note
Some fields marked required may be bypassed for draft data products. They are required for data products to be published.
Create a data product#
When you create or edit a data product, there are three screens to enter information:
Define data product
Define datasets
Data product details
Define data product#
Enter or edit the following information:
Data product title (required) - Enter a descriptive title. This is used for display purposes and to generate the schema name used to query the data. Make sure this name does not correspond to any existing schema in the data product’s catalog. This field is limited to 100 alphanumeric characters plus underline. This field cannot be changed once the data product is published.
Catalog (required) - Use the drop-down list to select the catalog in which to store the schema. Catalogs using the Hive or Iceberg connectors are supported, including Starburst Warp Speed-enabled Hive and Iceberg catalogs. This field cannot be changed once the data product is created.
Note
Delta Lake catalogs with Starburst Warp Speed are not supported for data products.
Schema name (required) - The value of this field is automatically generated based on the data product title, but you can set a custom name. Only lowercase alphanumeric characters and underscores are valid characters. This field is limited to 100 alphanumeric characters plus underline. You cannot use a schema name that is already in use.
Domain (required) - Use the drop-down list to assign a product to a domain.
Data product summary (required) - Enter a brief summary of 150 characters or less to be displayed in the list or grid view of the data products dashboard.
Data product description - This unlimited text field allows you to provide a detailed description. Include pertinent information to help users of your data product, such as its data granularity, intended use, and methodology.
When you are done adding the information, do one of the following:
Click Save and continue to proceed to the next screen. The Define datasets screen appears.
Click Save as draft. Your changes are marked
Draftand you are taken to the data products dashboard.
Define datasets#
The Define datasets screen contains tabs for each dataset across the top of the screen. Enter or edit the following information for your datasets:
Published dataset name (required) - Enter a descriptive name. This field is limited to 128 alphanumeric characters plus underline.
Dataset description - This text field allows you to provide a detailed description. Include pertinent information such as grain, intended use, and how it relates to other datasets in the data product. Markdown is supported in this field.
Dataset type (required) - You must select a type for your dataset, either View or Materialized view.
Selecting View creates a view. This stores the SQL definition and executes the query whenever the data product is accessed.
Selecting Materialized view creates a materialized view of the data. This creates an actual storage table with the data, and provides better query performance. You must specify a refresh schedule using one of the following options:
Refresh interval: Enter the number of minutes between refreshes.
Enter cron expression: Enter a cron expression for more precise scheduling.
Optionally, select Override default storage schema to choose a predefined custom storage schema for the materialized view.
Query (required) - Enter the query that defines your dataset.
Note
The Materialized view dataset type is only available for Hive catalogs with materialized views enabled.
Once you have provided the required information, you must click Show columns
and add column descriptions. SEP runs the query and loads a list with the
columns created by the SELECT statement.
Enter column descriptions to assist your users to better understand the data. This step is optional for each column but recommended.
Click Preview to open a pop-up window with an example result set of your query, limited to ten rows.
If your data product has only one dataset, you are done. If you need to add more datasets, click Add another dataset. When you are done adding datasets, do one of the following:
Click Save and continue to proceed to the next screen. The Data product details screen appears.
Click Save as draft. Your changes are marked as
Draftand you are taken to the data products dashboard.
Note
When viewing a dataset, you can clone a dataset by clicking the clone (stacked paper) icon next to the dataset’s name. You will be prompted to enter a new named for the cloned dataset. Cloning is not available in Edit mode.
Data product details#
The Data product details screen allows you to add the following information to a data product:
Owner (required) - Including name and email. Multiple owners can be assigned.
Tags - Create a new tag or select from existing tags.
Links - Add one or more links relevant to the data. Includes link label text as well as the URL.
When all desired details are added, do one of the following:
Click Save and review to proceed to the next screen. The overview screen for the data product appears.
Click Save as draft. Your changes are marked
Draftand you are taken to the data products dashboard.
In the Datasets section or the Usage examples tab, open a new query editor tab by clicking the query icon < > next to the name of the dataset you want to query.
Publish a data product#
When you finish creating or editing a data product, the Data product details pane shows a Publish button. The data product’s schema and views do not exist until you publish the product.
The following information is required in order to publish a data product:
Name
Catalog
Domain
Summary
At least one dataset with name and query
At least one data product owner
Click Publish to create the data product, or to update an existing data product. This has the following effects:
Creates the data product as a schema in the data product’s catalog if it does not yet exist. The schema is created in the domain’s default location if provided, or in the catalog’s configured default location.
Creates its defined datasets as views or materialized views in that schema.
The data product’s status transitions to Published, and is no longer a Draft.
Edit or remove a published data product#
If you edit a published data product’s definition, such as adding or removing datasets, or editing a dataset’s query, the changes are not automatically reflected in the datasets. Instead, the data product transitions to the Pending changes status. The changes are only synced with the data sources when you click Publish again.
You can change the type of a draft dataset. To change the type of a published dataset you must publish the data product again, until then the existing dataset is marked for the deletion and the dataset with the new type is a draft.
If you delete a published data product, this also deletes its schema and views or materialized views from the catalog.
Import or export a data product#
Data Products as Code lets users represent, export, and import Starburst data products as YAML files. This provides a standard, repeatable definition for data products and supports versioning and auditing of metadata changes through the Starburst Enterprise web UI and Starburst Enterprise platform (SEP) REST API.
Note
Data Products as Code is available as a private preview in Starburst Enterprise. By default, the feature is visible in the data products dashboard but not enabled. Contact your Starburst account with questions or feedback.
To import or export a data product, you must have the
starburst.data-product-as-code.enabled=true configuration property.
Export a data product#
Export a data product to download its definition as a YAML file. To export a data product, you must have the SHOW privilege to the data product or domain you want to export.
In the Data products dashboard, click the more_vert options menu of the data product you want to export. You cannot export shared data products.
Select Download as YAML.
In the Export Template dialog, click Download as YAML.
Import a data product#
Import a YAML template to create a new data product or overwrite an existing one. To import a data product, you must have the CREATE privilege. If you want to overwrite an existing data product and are not the owner, you must also have the ALTER privilege.
Click Import data product.
In the Import Data Product dialog, drag and drop a YAML file or click to select one. A data product name must be unique within a domain. The YAML file size must be less than or equal to
10MB.Optionally, select Overwrite existing data product of the same name to replace an existing data product in the same domain. If you do not select this option and a data product with the same name exists, import fails.
Click Import.
In the Data product details pane, click Publish.
The following example shows a complete YAML file for a data product using Data Products as Code.
apiVersion: v1
kind: DataProduct
metadata:
name: Comprehensive Data Product
catalogName: analytics_catalog
schemaName: comprehensive_schema
dataDomainName: Enterprise Analytics Domain
summary: Complete data product with all fields populated for comprehensive testing
description: |
# Comprehensive Data Product
This is a **complete** data product definition with:
- Multiple owners
- Multiple relevant links
- Multiple tags
- Multiple sample queries
- Multiple views with multiple columns
- Multiple materialized views with complex properties
- Export metadata with all timestamps
owners:
- name: Alice Johnson
email: alice.johnson@example.com
- name: Bob Smith
email: bob.smith@example.com
relevantLinks:
- label: Documentation
url: https://docs.example.com
- label: Architecture Diagram
url: https://diagrams.example.com
tags:
- analytics
- marketing
- enterprise
- real-time
- historical
sampleQueries:
- name: Daily Active Users
description: Count of unique users active each day in the last 30 days
query: |
SELECT
event_date,
COUNT(DISTINCT user_id) as active_users
FROM user_activity
WHERE event_date >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY event_date
ORDER BY event_date DESC
- name: Top 10 Customers
query: "SELECT customer_id, customer_name, total_purchases FROM customer_summary ORDER BY total_purchases DESC LIMIT 10"
views:
- name: user_activity
description: Cleaned and enriched user activity events with user dimensions
viewSecurityMode: DEFINER
definitionQuery: |
SELECT
e.event_id,
e.event_timestamp,
e.event_type,
e.user_id,
u.user_name,
u.user_email,
u.user_segment,
e.session_id,
e.device_type,
e.country_code,
e.properties
FROM raw.events.user_events e
LEFT JOIN raw.dimensions.users u ON e.user_id = u.id
WHERE e.event_timestamp >= CURRENT_DATE - INTERVAL '90' DAY
columns:
- name: event_id
type: varchar
description: Unique identifier for the event
- name: event_timestamp
type: timestamp(3) with time zone
description: When the event occurred
- name: event_type
type: varchar
description: "Type of event (click, view, purchase, etc.)"
- name: user_id
type: varchar
description: User identifier
- name: user_name
type: varchar
description: Full name of the user
- name: user_email
type: varchar
description: Email address of the user
- name: user_segment
type: varchar
description: "User segment (premium, standard, trial)"
- name: session_id
type: varchar
description: Session identifier
- name: device_type
type: varchar
description: "Device type (mobile, desktop, tablet)"
- name: country_code
type: varchar(2)
description: ISO 3166-1 alpha-2 country code
- name: properties
type: "map(varchar, varchar)"
description: Additional event properties as key-value pairs
- name: sales_summary
description: "Aggregated sales data by product, category, and time dimensions"
viewSecurityMode: INVOKER
definitionQuery: |
SELECT
DATE_TRUNC('day', s.sale_timestamp) as sale_date,
YEAR(s.sale_timestamp) as year,
MONTH(s.sale_timestamp) as month,
p.product_id,
p.product_name,
p.category,
c.customer_id,
c.customer_segment,
SUM(s.quantity) as total_quantity,
SUM(s.amount) as total_revenue,
AVG(s.amount) as avg_transaction_value
FROM raw.transactions.sales s
LEFT JOIN raw.dimensions.products p ON s.product_id = p.id
LEFT JOIN raw.dimensions.customers c ON s.customer_id = c.id
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8
columns:
- name: sale_date
type: date
description: Date of the sale
- name: year
type: integer
description: Year of the sale
- name: month
type: integer
description: Month of the sale (1-12)
- name: product_id
type: varchar
description: Product identifier
- name: product_name
type: varchar
description: Name of the product
- name: category
type: varchar
description: Product category
- name: customer_id
type: varchar
description: Customer identifier
- name: customer_segment
type: varchar
description: Customer segment
- name: total_quantity
type: bigint
description: Total units sold
- name: total_revenue
type: "decimal(18,2)"
description: Total revenue in USD
- name: avg_transaction_value
type: "decimal(18,2)"
description: Average transaction value
materializedViews:
- name: daily_user_metrics
description: Pre-aggregated daily user metrics for fast dashboard queries
definitionQuery: |
SELECT
DATE_TRUNC('day', event_timestamp) as metric_date,
user_id,
user_segment,
COUNT(*) as event_count,
COUNT(DISTINCT session_id) as session_count,
COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN event_id END) as purchase_count,
SUM(CASE WHEN event_type = 'purchase' THEN CAST(properties['amount'] AS DECIMAL(18,2)) END) as total_revenue,
MIN(event_timestamp) as first_event_time,
MAX(event_timestamp) as last_event_time
FROM user_activity
GROUP BY 1, 2, 3
definitionProperties:
grace_period: 5m
incremental_column: metric_date
refresh_interval: 1h
storage_schema: mv_storage
columns:
- name: metric_date
type: date
description: Date of the metrics
- name: user_id
type: varchar
description: User identifier
- name: user_segment
type: varchar
description: User segment
- name: event_count
type: bigint
description: Total number of events
- name: session_count
type: bigint
description: Number of distinct sessions
- name: purchase_count
type: bigint
description: Number of purchase events
- name: total_revenue
type: "decimal(18,2)"
description: Total revenue from purchases
- name: first_event_time
type: timestamp(3) with time zone
description: Timestamp of first event
- name: last_event_time
type: timestamp(3) with time zone
description: Timestamp of last event
- name: customer_lifetime_value
description: Customer lifetime value metrics refreshed weekly
definitionQuery: |
SELECT
c.customer_id,
c.customer_segment,
c.signup_date,
COUNT(DISTINCT s.sale_date) as purchase_days,
SUM(s.total_quantity) as lifetime_quantity,
SUM(s.total_revenue) as lifetime_revenue,
AVG(s.total_revenue) as avg_order_value,
MIN(s.sale_date) as first_purchase_date,
MAX(s.sale_date) as last_purchase_date,
DATEDIFF('day', MIN(s.sale_date), MAX(s.sale_date)) as customer_age_days
FROM raw.dimensions.customers c
LEFT JOIN sales_summary s ON c.customer_id = s.customer_id
GROUP BY 1, 2, 3
definitionProperties:
refresh_schedule: 0 0 * * 0
storage_schema: analytics_warehouse
columns:
- name: customer_id
type: varchar
description: Customer identifier
- name: customer_segment
type: varchar
description: Customer segment
- name: signup_date
type: date
description: Date customer signed up
- name: purchase_days
type: bigint
description: Number of days with purchases
- name: lifetime_quantity
type: bigint
description: Total items purchased
- name: lifetime_revenue
type: "decimal(18,2)"
description: Total lifetime revenue
- name: avg_order_value
type: "decimal(18,2)"
description: Average order value
- name: first_purchase_date
type: date
description: Date of first purchase
- name: last_purchase_date
type: date
description: Date of most recent purchase
- name: customer_age_days
type: integer
description: Days between first and last purchase
exportMetadata:
exportedAt: 2025-12-09T15:30:45.123Z
exportedBy: data.engineer@example.com
status: PUBLISHED
createdAt: 2025-11-15T09:00:00Z
publishedAt: 2025-11-20T14:30:00Z
Republish a data product#
Click the Republish button to recreate all views or materialized views in a data product. Use this option when source table schemas change but the data product definition does not change.
Select the Force republish data product checkbox to confirm, then click Republish.
Data product security#
The views that implement a dataset are created with SECURITY DEFINER mode.
The logged-in username that publishes the data product becomes the view owner in the catalog.
If built-in access control or another access control system is enabled, to publish a data product, the view owner’s role must have the following privileges, or be a member of a group role granted these privileges:
Object secured |
Privilege |
Notes |
|---|---|---|
Data products |
|
For the data product’s domain or for a single data product. |
Queries |
|
|
Tables |
|
For all tables in the dataset definition. Must specify the Allow role receiving grant to grant to others option. |
Tables |
|
In the data product’s catalog, to allow creating schemas and views. |
The following shows additional privileges that must be granted to the role of the user or group to allow data products management tasks:
Object secured |
Privilege |
Notes |
|---|---|---|
Data products |
|
To allow creating a domain. |
Data products |
|
To allow editing data products in a domain or to allow editing a domain. |
Data products |
|
To allow deleting a data product or a domain. |
Tables |
|
To allow refreshing materialized views, if those are used in the dataset. |
You specify a data product user with the
data-product.starburst-user property in the initial data product configuration. This data product user impersonates the
logged-in user when it executes data product operations on the data source, such
as creating schemas and views.
When impersonating, all roles of the impersonated user are enabled, other than
the sysadmin role. When publishing a data product as sysadmin, make sure
the privileges listed in this table are granted to a role other than
sysadmin.
SEP’s content security policy (CSP)
prevents the rendering of external images by default. To allow images, you must
configure the http-server.content-security-policy. Include all values to
prevent overriding the CSP, as in the following examples:
This is the default CSP value:
http-server.content-security-policy=default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline' fonts.googleapis.com; img-src 'self' data:; font-src 'self' fonts.gstatic.com data:; frame-ancestors 'self';
Allow images from a specified domain. The following example allows images from upload.wikimedia.org:
http-server.content-security-policy=default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline' fonts.googleapis.com; img-src 'self' upload.wikimedia.org data:; font-src 'self' fonts.gstatic.com data:; frame-ancestors 'self';
Separate multiple domains with a space. The following example allows images from upload.wikimedia.org and upload.different.org:
http-server.content-security-policy=default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline' fonts.googleapis.com; img-src 'self' http://upload.wikimedia.org http://upload.different.org data:; font-src 'self' fonts.gstatic.com data:; frame-ancestors 'self';
You can use * to allow all domains:
http-server.content-security-policy=default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline' fonts.googleapis.com; img-src * data:; font-src 'self' fonts.gstatic.com data:; frame-ancestors 'self';