Enable Starburst Warp Speed for your cluster#
Starburst Warp Speed transparently adds an indexing and caching layer to enable higher performance. You can take advantage of the performance improvements by updating your cluster to suitable hardware and configuring the Starburst Warp Speed utility connector for any catalog accessing object storage with the Hive, Iceberg, or Delta Lake connector.
Requirements#
To use Starburst Warp Speed, you need:
One or more catalogs that use the Hive, Iceberg, or Delta Lake connectors.
A cluster deployment on a supported Kubernetes-based platform.
A valid Starburst Enterprise license.
Platform-specific requirements#
Starburst Warp Speed requires your cluster to operate on a supported Kubernetes-based platform.
In addition, Starburst Warp Speed requires specific nodes in terms of CPU and memory. The most important additional requirement is that sufficiently performant and sized Non-Volatile Memory Express (NVMe) solid-state drive (SSD) storage is available on all nodes, including the coordinator, and is exclusively used by SEP.
Warning
Starburst strongly recommends SSDs that offer encryption at rest. Cached data stored on unencrypted SSDs may be exposed if the underlying hardware is accessed outside of SEP.
The following specific details apply for the supported platforms:
EKS#
The following instance types are known to meet Starburst Warp Speed’s NVMe SSD requirements. This is not an exhaustive list. Any instance with directly-attached NVMe SSD storage and sufficient CPU and memory may be suitable.
Sample node sizes:
m8gd.4xlargeor largerm7gd.4xlargeor largerr8gd.4xlargeor largerr7gd.4xlargeor largerm6id.4xlargeor largerm6idn.4xlargeor largerm6gd.4xlargeor largerr5d.4xlargeor largerr5dn.4xlargeor largerr6gd.4xlargeor largeri3.4xlargeor larger
Create an EKS Managed Node Group with the desired size. Use this node group for
all nodes in the cluster. Use the latest generally available release of
eksctl.
For information on specific deployment scenarios based on your privileges and node types, see the Cluster configuration section.
AKS#
The following instance types are known to meet Starburst Warp Speed’s NVMe SSD requirements. This is not an exhaustive list. Any instance with directly-attached NVMe SSD storage and sufficient CPU and memory may be suitable.
Sample node sizes:
Standard_L16s_v2or largerLsv2series Azure VMs likeStandard_L32s_v2with SSDs attached. Lsv2-series VM SSDs are not encrypted by Azure Storage encryption. We strongly recommendStandard_L16s_v3or largerLsv3series VMs.Standard_Dpdsv6(ARM-based) with premium SSD v2. Starburst recommends these ARM-based VMs for the best price-performance ratio for running Starburst Warp Speed in Azure environments. Premium SSD v2 is the only storage type that supports both encryption at rest and ephemeral storage.In the Starburst Warp Speed section of
values.yaml, set the following configuration:
warpSpeed:
image:
tag: "a.b.c-azure"
GKE#
The following instance types are known to meet Starburst Warp Speed’s NVMe SSD requirements. This is not an exhaustive list. Any instance with directly-attached NVMe SSD storage and sufficient CPU and memory may be suitable.
Sample node sizes:
n2-highmem-16or larger with a minimum of two local NVME SSDs attachedn2d-standard-16or largerC4A Axion(ARM-based) with Titanium SSD. Starburst recommends these ARM-based VMs for the best price-performance ratio for running Starburst Warp Speed in Google Cloud environments.GKE version
1.25.3-gke.1800or higher
Attach the SSDs during node pool creation, and use the pool for the cluster creation.
Caution
Use the gcloud CLI, not Google Cloud console, for node pool creation. Using the Google console UI creates incompatible disk types.
Use the ephemeral-storage-local-ssd gcloud CLI command to provision local
SSDs on the cluster. Select an even number of workers.
On-premises#
Directly-attached NVMe SSDs on all nodes are recommended. Network-attached drives are supported but offer reduced performance.
Minimum
1TBtotal disk space recommended.
All platforms#
Configuration considerations:
The
task.max-worker-threadstask property can not be changed with Starburst Warp Speed so it must be left at the default value.SEP clusters with a vCPU count of 16 or fewer per worker node support a maximum of four Starburst Warp Speed catalogs. Clusters with larger worker nodes support a maximum of seven Starburst Warp Speed catalogs.
Warning
Starburst Warp Speed is not supported on a cluster running in Fault-tolerant execution mode.
Starburst Warp Speed uses local storage as an ephemeral cache. Replicating storage volumes for Starburst Warp Speed is unnecessary for ephemeral cache data and reduces the total storage capacity available for caching and indexing.
Deployment and management is performed with the SEP Helm charts detailed in Deploy with Kubernetes and Configuring Starburst Enterprise in Kubernetes.
Cluster configuration#
The following sections outline the step-by-step process for configuring and deploying Starburst Warp Speed on your Kubernetes cluster.
General configuration#
Add the following section in your values file for the specific cluster to enable Starburst Warp Speed:
warpSpeed:
enabled: true
Starburst Warp Speed uses a filesystem on top of the underlying storage in order to not require privileged mode. The following configuration block shows Warp Speed and filesystem options. Add properties to your values file when overriding a default:
warpSpeed:
enabled: true
# Physical drive configuration requires the Warp Speed init container within
# a worker pod to be privileged.
# When configuration is done, the container's lifecycle ends.
# If autoConfigure is not enabled, devices must be manually configured
# before SEP pods are started. For example, during the node machine
# bootstrap process (preBootstrapCommands script in EKS deployments).
# This functionality is available for AKS, EKS, and GKE only.
# For AWS, use default setting.
# For Azure and GKE deployments, or S3 deployments using the native
# filesystem implementation, set autoConfigure to true.
autoConfigure: true
# Additional percentage of container memory reduced from heap size assigned
# to Java, must be less than 100
additionalHeapSizePercentage: 15
fileSystem:
# The path for the mount point used to mount the local SSDs. Value differs
# between clouds:
# AWS, Azure - Defined by the user
# Google Cloud - Must be set to /mnt/stateful_partition/kube-ephemeral-ssd/data
localStorageMountPath: /opt/data/<subdirectory>
image:
# Image that prepares the filesystem for Starburst Warp Speed.
# Due to system limitations, this must be done by an init container running
# in privileged mode.
repository: "harbor.starburstdata.net/starburstdata/starburst-warpspeed-init"
# AWS with pre configured NVMe RAID by other tools or eksctl preBootstrapCommands
# - set `autoConfigure: true` and use default tag or `autoConfigure: false`
# and skip this section
# AWS with Bottlerocket nodes - set `autoConfigure: false` and ignore this setting
# AWS with Linux nodes - set `autoConfigure: true and add `-azure` suffix, ex. 1.0.19-awsprivileged`
# GCP with already pre configured drives (details in Starburst Enterprise docs) - Use default tag
# Azure - add `-azure` suffix, ex. 1.0.19-azure
tag: "1.0.19"
You need to ensure that you use a dedicated coordinator that is not scheduled for query processing, and adjust the query processing configuration to allow for more splits:
coordinator:
additionalProperties: |
...
node-scheduler.include-coordinator=false
node-scheduler.max-splits-per-node=4096
Use Helm to update the values and restart the cluster nodes. Confirm the cluster is operating correctly with the new configuration, but without any adjusted catalogs, and then proceed to configure catalogs.
Related to the catalog usage, the cluster needs to allow internal communication
between all workers, as well as with the coordinator on all the HTTP ports
configured by the different values for http-rest-port in all catalogs.
When starting the cluster, Starburst Warp Speed parses all configuration parameters and can send
invalid warnings such as Configuration property 'cache-service.password' was not used. You can safely ignore these warnings.
AWS deployment scenarios#
You can deploy Starburst Warp Speed on AWS using three different approaches, depending on your privileges and node configuration.
Scenario 1: Full privileges deployment#
If your privileges let you execute pre-boot scripts, configure the NVMe SSDs before you start the cluster. This allows optimal performance and control over your storage configuration.
Pre-deployment configuration#
Add the following preBootstrapCommands section to your EKS managed node group
configuration for both the worker and coordinator nodes:
preBootstrapCommands:
- "yum install -y mdadm"
- "sysctl -w fs.aio-max-nr=8388608 >> /etc/sysctl.conf"
- 'devices=""; for device in $(ls /sys/block/); do if [[ $(grep -e "Amazon EC2 NVMe Instance Storage" -e "ec2-nvme-instance" /sys/block/${device}/device/subsysnqn -c 2> /dev/null) -gt 0 ]]; then devices="${devices} /dev/${device}"; fi; done; echo ${devices} > /tmp/devices'
- "mdadm --create /dev/md0 $(cat /tmp/devices) --level=0 --force --raid-devices=$(cat /tmp/devices | wc -w)"
- "mkfs.ext4 /dev/md0 -O ^has_journal"
- "mkdir -p /opt/data"
- "mount /dev/md0 /opt/data"
- "chmod 777 -R /opt/data"
Note
The aio-max-nr parameter specifies the maximum number of asynchronous I/O
(AIO) events the system allows. Set this value to 8388608 to support
high-performance workloads.
Helm configuration#
The following shows the Warp Speed-specific properties for this scenario. For all available configuration options and their defaults, see General configuration:
warpSpeed:
enabled: true
autoConfigure: false # Pre-boot script handles configuration
fileSystem:
localStorageMountPath: /opt/data
image:
repository: "harbor.starburstdata.net/starburstdata/starburst-warpspeed-init"
tag: "1.0.19"
pullPolicy: "IfNotPresent"
Deployment#
Use the following command to deploy Starburst Warp Speed using Helm:
helm upgrade sep
oci://harbor.starburstdata.net/starburstdata/charts/starburst-enterprise
--namespace <namespace> --install --version <version> --values YAML/values.yaml
Scenario 2: Limited privileges deployment#
If your privileges do not let you execute pre-boot scripts, configure NVMe storage for Starburst Warp Speed using a specialized container image. This lets you manage storage configuration from within the Kubernetes environment.
This deployment scenario uses a container image with the -awsprivileged suffix
that has the permissions to automatically configure the NVMe drives.
Helm configuration#
In your values.yaml file, set the following values:
warpSpeed:
enabled: true
autoConfigure: true
additionalHeapSizePercentage: 15
fileSystem:
localStorageMountPath: /opt/data
image:
repository: "harbor.starburstdata.net/starburstdata/starburst-warpspeed-init"
tag: "1.0.19-awsprivileged"
pullPolicy: "IfNotPresent"
Deployment#
Use the following command to deploy Starburst Warp Speed using Helm:
helm upgrade sep
oci://harbor.starburstdata.net/starburstdata/charts/starburst-enterprise
--namespace <namespace> --install --version <version> --values YAML/values.yaml
Scenario 3: Bottlerocket OS deployment#
If you are using AWS Bottlerocket nodes, configure the disk setup through the Bottlerocket bootstrap container system. This lets you prepare NVMe storage for Starburst Warp Speed in Bottlerocket’s container-optimized environment.
EC2 user data configuration#
Add the following to your Amazon EC2 user data:
[settings.bootstrap-containers.disk-setup]
essential = true
mode = "once"
source = "http://harbor.starburstdata.net/starburstdata/starburst-warpspeed-init:1.0.17-bottlerocket"
If necessary, include your Harbor credentials:
[[settings.container-registry.credentials]]
registry = "harbor.starburstdata.net"
username = "some_user"
password = "a_password"
Note
You must have Harbor registry credentials to access the Starburst Warp Speed init container images.
Helm configuration#
In your values.yaml file, set the following values:
warpSpeed:
enabled: true
autoConfigure: false
additionalHeapSizePercentage: 15
fileSystem:
localStorageMountPath: /mnt/opt/data
image:
repository: "harbor.starburstdata.net/starburstdata/starburst-warpspeed-init"
tag: "1.0.19"
pullPolicy: "IfNotPresent"
Deployment#
Use the following command to deploy Starburst Warp Speed using Helm:
helm upgrade sep
oci://harbor.starburstdata.net/starburstdata/charts/starburst-enterprise
--namespace <namespace> --install --version <version> --values YAML/values.yaml
Verification#
Verify your configuration with the following steps.
NVMe drives#
Verify that Starburst Warp Speed properly detects your NVMe drives:
ls /sys/block/
cat /sys/block/<device>/device/subsysnqn
If the output does not contain “Amazon EC2 NVMe Instance Storage” or “ec2-nvme-instance”, update the disk filter list in your Helm configuration:
warpSpeed:
fileSystem:
diskFilterStringList:
- "custom-nvme-identifier"
Storage mount#
Check the storage:
kubectl exec -it <worker-pod-name> -- df -ah
Look for a mount at /opt/data (or /mnt/opt/data for Bottlerocket) with
significant storage space available.
On-premises deployment#
To use Starburst Warp Speed with an on-premises deployment, use the following steps to configure RAID 0 across the local NVMe drives on each node.
List available block devices:
lsblk
Create the RAID 0 array:
sudo mdadm --create /dev/md0 /dev/nvme0n1 /dev/nvme1n1 --level=0 --force --raid-devices=2
Format the RAID volume:
sudo mkfs.ext4 /dev/md0 -O ^has_journal
Create and mount the directory for Starburst Warp Speed data:
sudo mkdir /opt/wsdata
sudo mount /dev/md0 /opt/wsdata
sudo chown -R azureuser /opt/wsdata
chmod -R 777 /opt/wsdata
Catalog configuration#
After a successful Cluster configuration, you can configure the desired catalogs to use Starburst Warp Speed.
Only catalogs using the Hive, Iceberg, or Delta Lake connectors can be accelerated:
connector.name=hiveconnector.name=icebergconnector.name=delta_lake
For more details, see Delta Lake considerations, Iceberg considerations, and Hive considerations.
Only catalogs backed by S3, S3-compatible, GCS, and ADLS object storage are supported. For more details, see S3 considerations, GCS considerations, and ADLS considerations.
Update the example catalog that uses the Hive connector with AWS Glue in the
values file.
catalogs:
example: |
connector.name=hive
hive.metastore=glue
...
Enable Starburst Warp Speed on the catalog by updating the connector name to warp_speed and
adding the required configuration properties:
catalogs:
example: |
connector.name=warp_speed
warp-speed.proxied-connector=hive
warp-speed.cluster-uuid=example-cluster-567891234567
# Do not configure the following property if you are using a native
# filesystem implementation.
hive.metastore=glue
...
The properties setting the connector name, the proxied connector and the cluster identifier are required.
The shared secret must be set to the same value as the secret for the cluster
itself set in sharedSecret:. This is required unless the REST API is
disabled.
For testing purposes, or alternatively for permanent usage of a new catalog
name, such as faster, in parallel to the existing catalog, you can copy the
configuration of a catalog and update it:
catalogs:
example: |
connector.name=hive
hive.metastore=glue
...
faster: |
connector.name=warp_speed
warp-speed.proxied-connector=hive
warp-speed.cluster-uuid=example-cluster-567891234567
# Do not configure the following property if you are using a native
# filesystem implementation.
hive.metastore=glue
...
This allows you to query the same data with or without Starburst Warp Speed using different
catalog names. However, existing scripts and statements that include the old
catalog name example are not accelerated.
File system cache#
Starburst Warp Speed uses the file system cache for data
caching. You must add the following fs.cache properties to each catalog that
uses Starburst Warp Speed for the caching component of Starburst Warp Speed to function:
catalogs:
example: |
connector.name=warp_speed
...
fs.cache.enabled=true
fs.cache.directories=/opt/data/fsc
fs.cache.max-disk-usage-percentages=90
...
Note
Starburst recommends using the default /opt/data/fsc prefix followed by
<catalog-name>/. The directory for fs.cache.directories must be unique for
each catalog.
Warning
When configuring multiple Warp speed catalogs, ensure that the combined values
of fs.cache.max-disk-usage-percentages across all catalogs do not exceed 100
percent. For more information, see File system
cache.
Catalog configuration properties#
The following table provides more information about the available catalog configuration properties:
Property name |
Description |
|---|---|
|
Required. Must be set to |
|
Required. The type of embedded connector that is used for accessing cold
data through Starburst Warp Speed. Valid values are |
|
Required. Unique identifier of the cluster. Used as the folder name in the store path. Use the same value for all catalogs. |
|
The shared secret value of the cluster. It is configured for secure internal communication in
|
|
Enables the REST API server on the coordinator and
each worker for each catalog. Defaults to Add a unique |
|
Enable or disable Starburst Warp Speed extensions. Defaults to |
Hive considerations#
Most configurations of the Hive connector are supported. Additionally, the following considerations apply when using the Hive connector as the proxied connector for Starburst Warp Speed:
Materialized views are supported.
S3 proxy is not supported.
Server-side encryption with S3 managed keys and KMS managed keys is supported. S3 client-side encryption is not supported.
ORC ACID transactional tables are not supported.
For optimal performance, add the following properties to your catalog configuration:
catalogs:
example: |
...
hive.max-outstanding-splits-size=512MB
hive.max-initial-splits=0
hive.max-outstanding-splits=3000
...
Iceberg considerations#
All configurations of the Iceberg connector are supported. Additionally, the following considerations apply when using the Iceberg connector as the proxied connector for Starburst Warp Speed:
Materialized views are supported.
Server-side encryption with S3 managed keys is supported. S3 client-side encryption is not supported.
An associated split is served from object storage and no acceleration occurs when:
A row-level update or delete operation.
A merge operation that causes a record update.
Delta Lake considerations#
All configurations of the Delta Lake connector are supported. Additionally, the following considerations apply when using the Delta Lake connector as the proxied connector for Starburst Warp Speed:
Materialized views are not supported.
Server-side encryption with S3 managed keys is supported. S3 client-side encryption is not supported.
An associated split is served from object storage and no acceleration occurs when:
A row-level update or delete operation.
A merge operation that causes a record update.
For optimal performance, add the following properties to your catalog configuration:
catalogs:
example: |
...
delta.max-outstanding-splits=3000
...
S3 considerations#
Starburst Warp Speed supports Amazon S3 with catalogs using the Hive, Iceberg, and Delta Lake connectors.
Using the s3:// protocol is required.
GCS considerations#
Starburst Warp Speed supports Google Cloud Storage (GCS).
Authentication to GCS can use a JSON key file or an OAuth 2.0 access token configured identically for the Hive, Delta Lake, or Iceberg connector in the catalog properties:
hive.gcs.json-key-file-path=/path/to/gcs_keyfile.json
hive.gcs.use-access-token=false
For more information about authorization, refer to Google Cloud Service accounts documentation.
ADLS considerations#
Starburst Warp Speed supports Microsoft ADLS Gen 2. ADLS Gen1 is not supported.
Using the abfs:// or abfss:// protocol is required.
ADLS can be used with catalogs using the Hive and Delta Lake connectors with the following configuration properties to connect to Azure storage:
catalogs:
faster: |
...
hive.azure.abfs-storage-account=<storage_account_name>
hive.azure.abfs-access-key=xxx
...
It is possible to secure the connection with TLS and use the abfss protocol
with the URI
syntax.
Cluster management#
Starburst Warp Speed accommodates cluster expansion and contraction. Be aware of the following when scaling up or down:
When scaling a cluster horizontally (adding or removing worker nodes), Starburst Warp Speed continues operating, assuming that requirements are properly fulfilled. A cluster restart is not required when adding or removing nodes.
Scaling a cluster vertically to use larger nodes requires a cluster restart, which facilitates the replacement of all worker nodes to the larger node size.
After restarting the cluster, the default acceleration becomes active. New caches and indexes get created and populated based on the query workload.
Default acceleration#
When a query accesses a column that is not accelerated, the system performs data and index materialization on the cluster to accelerate future access to the data in the column. This process of creating the indexes and caches is also called warmup. Warmup is performed individually by each worker based on the processed splits and uses the local high performance storage of the worker. Typically, these are SSD NVMe drives.
When new data is added to a table or the index and cache creation are in progress, the new portions of the table that are not accelerated are served from the object storage. After the asynchronous indexing and caching is complete, query processing accessing that data is accelerated, because the data is available directly in the cluster from the indexes and caches, and no longer has to be retrieved from the remote object storage.
This results in immediately improved performance for recently used datasets.
Default acceleration is performed for SELECT * FROM <table_name> queries
that are commonly used to explore a table rather than to retrieve specific data.
Acceleration types#
Starburst Warp Speed uses different types of acceleration to improve query processing performance:
These acceleration types are used automatically by default acceleration.
Index acceleration#
Index acceleration uses the data in a specific column in a table to create an index. This index is added to the row group and used when queries access a column to filter rows. It accelerates queries that use predicates, joins, filters, and searches, and minimizes data scanning.
The index types (such as bitmap, tree, and others), are determined automatically by the column data types, and data patterns and characteristics.
Index acceleration activates only when the index filters out a sufficient proportion of rows. During query execution, Starburst Warp Speed compares rows returned by the index to total rows. If the proportion of filtered rows does not meet the data retention threshold, the remaining splits proceed without index acceleration.
Use the WARM_UP_TYPE_BASIC value in the warmUpType property to configure
index acceleration for a specific column with the REST API.
Text search acceleration#
Text search acceleration creates an index of the content of text columns using Apache Lucene. This index is used in query predicates. It accelerates queries that use predicates of filters and searches on text columns.
Starburst Warp Speed automatically enables text search acceleration, and maintains the indexes.
Text search acceleration uses Apache Lucene
indexing to accelerate text analytics and provide fast text filters,
particularly with LIKE predicates. The
KeywordAnalyzer
provides full support for LIKE semantics to search for the exact appearance of
a value in a filtered column.
A use case is a search for a specific short string in a larger column, such as a
description. For example, consider a table with a column named city and a
value New York, United States. The index is case-sensitive. When indexing is
applied to the column, the following query returns that record because the
LIKE predicate is an exact match:
SELECT *
FROM tbl
WHERE city LIKE '%New York%'
The following queries do not return the results because the LIKE predicates
are not an exact match. The first query is missing a space in the pattern:
SELECT *
FROM tbl
WHERE city LIKE '%NewYork%'
The second query uses lowercase:
SELECT *
FROM tbl
WHERE city LIKE '%new york%'
Text search acceleration indexing is recommended for:
Queries with
LIKEpredicates, prefix or suffix queries, or queries that use the starts_with functions.Range queries on string columns. A common use is dates that are stored as strings that have range predicates. For example,
date_string>='yyyy-mm-dd'.
Text search acceleration indexing supports the following data types:
CHARVARCHARCHAR ARRAYVARCHAR ARRAY
Use the WARM_UP_TYPE_LUCENE value in the warmUpType property to
configure text search acceleration for a specific column with the REST API.
Limitations:
The maximum supported string length is 33k characters.
Queries with nested expressions, such as
starts_with(some_nested_method(col1), 'aaa'), are not accelerated.Query predicates can contain a maximum of 128 unique columns.
Warp Speed REST API#
The following sections described how to configure and use the Starburst Warp Speed REST API to monitor and manage indexing behavior. Before using any of the endpoints described below, ensure you have completed the configuration steps in REST API access.
REST API access#
The Starburst Warp Speed REST API lets you configure and monitor indexing behavior. Note the REST API configuration affects only the indexing portion of Warp Speed. It does not affect data cached by the file system cache.
The REST API is not enabled by default. To enable it, set the following in each catalog configuration:
warp-speed.config.extensions.enabled=true
warp-speed.config.internal-communication.shared-secret=<shared-secret>
The shared secret must match the value configured for secure internal
communication in sharedSecret: for your Kubernetes deployment. If you are
using a native file system implementation, do not set the
shared secret property.
Once enabled, the REST API is available on the coordinator at the same port and domain as the SEP web UI, with a separate context for each catalog:
/ext/{catalogName}/{endpoint}
Access to the Starburst Warp Speed REST API is controlled by the same authentication and authorization as the Starburst Enterprise REST API.
REST API overview#
The following sections detail the REST API and available endpoints. The example
calls use plain curl calls to the endpoints for the faster catalog on the
cluster at sep.example.com using HTTPS and omitting any authentication.
Warming status#
You can determine the status of the warmup for Starburst Warp Speed with a GET operation of the
/warming/status endpoint. It measures the warmup progress for splits across
workers and if warming is currently taking place.
curl -X GET 'https://sep.example.com/ext/faster/warming/status' \
-H 'Accept: application/json'
Example response:
{"nodesStatus":
{"172.31.16.98": {"started":22136,"finished":22136},
"172.31.25.207":{"started":20702,"finished":20702},
"172.31.19.167":{"started":21116,"finished":21116},
"172.31.22.28":{"started":20678,"finished":20678}},
"warming":false}
The response shows that warmup started and finished on four workers, and is
currently not in progress. When starting the cluster, Starburst Warp Speed parses all
configuration parameters and can send invalid warnings such as Configuration property 'cache-service.password' was not used. You can safely ignore these
warnings.
Debug tools#
The debug-tools endpoint requires an HTTP POST to specify the detailed
command with a JSON payload to retrieve the desired data. You can use it to
return the storage utilization:
curl -X POST "https://sep.example.com/ext/faster/debug-tools" \
-d '{"commandName" : "all","@class" : "io.trino.plugin.warp.execution.debugtools.DebugToolData"}' \
-H 'Content-Type: application/json'
Example response:
{"coordinator-container":
{"result":
{"Storage_capacity":15000000,
"Allocated 8k pages":1000000,
"Num used stripes":0
}
}
}
Calculate the storage utilization percentage with (Allocated 8k pages / Storage_capacity) * 100.
Debug tools are blocked and can not be used during warming.
Row group count#
A row group in Starburst Warp Speed is a collection of index and cache elements that are used to accelerate processing of Trino splits from the SSD storage.
Note
A row group in Starburst Warp Speed is not equivalent to a Parquet row group or an ORC stripe, but a higher level artifact specific to Starburst Warp Speed. It can be related to a specific Parquet row group or ORC stripe but can also represent data from a whole file or more.
The row-group/row-group-count endpoint exposes all currently warmed up
columns via an HTTP GET:
curl -X GET "https://sep.example.com/ext/faster/row-group/row-group-count" \
-H "accept: application/json"
The result is a list of columns specified by schema.table.column.warumuptype
as the key. The value represents the corresponding count of accelerated row
groups. Warmup types:
WARM_UP_TYPE_BASICrepresents index acceleration.WARM_UP_TYPE_LUCENErepresents text search acceleration.
In the following example, 20 row groups of the tripid column of the
trips_data table in the trips schema are accelerated with an index.
{
trips.trips_data.tripid.WARM_UP_TYPE_BASIC": 20
}
SQL support#
All SQL statements and functions supported by the connector used in the accelerated catalog are supported by Starburst Warp Speed:
Starburst Warp Speed supports all data types, including structural data
types. All structural data types are accessible, but indexing is only applicable
to fields within ROW data types.
For some functions, Starburst Warp Speed does not accelerate filtering operations on columns. For example, the following filtering operation is not accelerated:
SELECT count(*)
FROM catalog.schema.table
WHERE lower(company) = 'starburst';
Starburst Warp Speed indexing accelerates the following functions when used on the left or the right side of the predicate:
ceil(x)withREALandDOUBLEdata typesin_nan(x)withREALandDOUBLEdata typescast(x as type)withDOUBLEcast toREAL, or any type cast toVARCHARcast(x as type)withDOUBLEandDECIMALdata typesday(d)andday_of_month(d)withDATEandTIMESTAMPdata typesday_of_year(d)anddoy(y)withDATEandTIMESTAMPdata typesday_of_week(d)anddow(d)withDATEandTIMESTAMPdata typesyear(d)withDATEandTIMESTAMPdata typesyear_of_week(d)andyow(d)withDATEandTIMESTAMPdata typesweek(d)andweek_of_year(d)withDATEandTIMESTAMPdata typesLIKEandNOT LIKEwithVARCHARdata typecontains(arr_varchar, value)with array ofVARCHARdata typesubstringandsubstrwithVARCHARdata typestrposwithBIGINTdata type
The maximum supported string length for any cached data type is 48000 characters.