Starburst catalog#
Starburst Enterprise automatically creates the starburst catalog during system
startup.
Note
You cannot configure the starburst catalog using properties files. If you have
an existing starburst.properties file or if you previously created a
starburst catalog using dynamic catalogs, you must remove these configurations
or startup fails.
You must migrate all configuration properties set in starburst.properties to
config.properties on both coordinator and worker nodes.
Storage functions#
Use storage functions to read from or write to external object storage locations directly within your SQL queries.
Storage functions support Amazon S3, Google Cloud Storage, and Azure Data Lake Storage.
Note
Storage functions are available as a public preview feature. Contact your Starburst account team with questions or feedback.
Configuration#
Storage functions require credentials to authenticate with object storage
systems. You must specify exactly one of the following properties in
config.properties:
Property name |
Description |
|---|---|
|
Base64-encoded credential key |
|
Path to a JSON credential file |
Warning
You must specify either io.credentials-key or io.credentials-file, but not
both. If you set both properties or neither property, the system fails to start.
Credential file format#
The credential file uses JSON format to define storage configurations. The connector matches locations based on prefix comparison and selects the most specific match.
Note
The connector throws an exception if a specified location matches no configuration or matches multiple configurations. Ensure your location prefixes are specific enough to avoid conflicts.
{
"configurations": [
{
"id": "s3-us-east-1",
"location": "s3://my-bucket-east/",
"configuration": {
"fs.native-s3.enabled": "true",
"s3.aws-access-key": "YOUR_ACCESS_KEY",
"s3.aws-secret-key": "YOUR_SECRET_KEY",
"s3.region": "us-east-1"
}
},
{
"id": "gcs-bucket",
"location": "gs://my-gcs-bucket/",
"configuration": {
"fs.native-gcs.enabled": "true",
"gcs.json-key": "YOUR_JSON_KEY"
}
},
{
"id": "azure-container",
"location": "abfs://container@account.dfs.core.windows.net/",
"configuration": {
"fs.native-azure.enabled": "true",
"azure.auth-type": "ACCESS_KEY",
"azure.access-key": "YOUR_ACCESS_KEY"
}
}
]
}
To generate the base64-encoded key from a credential file, use the following command:
cat credentials.json | base64
UNLOAD#
The UNLOAD function is a table function that exports query results directly to
external storage locations.
The function is available in the io schema for Starburst functions.
Warning
Fault-tolerant clusters do not support the UNLOAD function.
Syntax#
SELECT * FROM TABLE(io.unload(
input => TABLE(...) [PARTITION BY col [, ...]],
location => 'storage_path',
format => 'file_format'
[, compression => 'compression_type']
[, separator => 'delimiter']
[, header => true|false]
)
)
Parameters#
The following parameters are required:
input: The source datalocation: The target storage pathformat: The output file format
The following parameters are optional:
compression: The compression algorithm to apply to files. Choose from the following options:NONE(default)SNAPPYLZ4ZSTDGZIP
separator: Field delimiter forCSVandTEXTFILEformatsheader: Boolean that specifies whether to include column headers in the output file when usingCSVandTEXTFILEformats. Defaults tofalse.
Supported formats#
The UNLOAD function supports the following file formats:
ORCPARQUETAVROCSVJSONTEXTFILERCBINARYRCTEXTSEQUENCEFILEOPENX_JSON
For more information, see Object storage file formats.
Examples#
The following example unloads the orderkey, custkey, and orderstatus
columns from the tpch.sf1.orders table to the
s3://mybucket/my/unload/location folder in ORC file format:
SELECT
*
FROM
TABLE (
io.unload (
input => TABLE (
SELECT
orderkey,
custkey,
orderstatus
FROM
tpch.sf1.orders
)
PARTITION BY
(orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'ORC'
)
)
The following example unloads the entire tpch.sf1.orders table:
SELECT
*
FROM
TABLE (
io.unload (
input => TABLE (tpch.sf1.orders)
PARTITION BY
(orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'ORC'
)
)
The following example unloads the table into multiple partitions:
SELECT
*
FROM
TABLE (
io.unload (
input => TABLE (
SELECT
orderkey,
custkey,
orderdate,
orderstatus
FROM
tpch.sf1.orders
)
PARTITION BY
(orderdate, orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'TEXTFILE',
compression => 'GZIP',
separator => '|'
)
)
The following example unloads the table as a compressed, delimited text file with column headers:
SELECT
*
FROM
TABLE (
io.unload (
input => TABLE (
SELECT
orderkey,
custkey,
orderstatus
FROM
tpch.sf1.orders
)
PARTITION BY
(orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'TEXTFILE',
compression => 'GZIP',
separator => '|',
header => true,
)
)
LOAD#
The LOAD function is a table function that reads data from external storage
locations directly into queries.
The function is available in the io schema for Starburst functions.
Warning
Fault-tolerant clusters do not support the LOAD function.
Syntax#
SELECT * FROM TABLE(io.load(
location => 'storage_path'
[, format => 'file_format']
[, columns => DESCRIPTOR(column_name type, ...)]
)
)
Parameters#
The following parameter is required:
location: The source storage path
The following parameters are optional:
format: The input file formatcolumns: Defines column names and types
Note
You must specify both the format and columns parameters together, or omit
both. When you omit these parameters, the LOAD function automatically
discovers the format and schema.
Supported formats#
The LOAD function supports the following file formats:
ORCPARQUETAVROCSVJSONTEXTFILERCBINARYRCTEXTSEQUENCEFILEOPENX_JSON
For more information, see Object storage file formats.
Examples#
The following example loads a Parquet file and uses automatic schema discovery:
SELECT * FROM TABLE(
io.load(
location => 's3://mybucket/data/'
)
)
The following example loads ORC files from a directory and defines an explicit schema:
SELECT * FROM TABLE(
io.load(
location => 's3://mybucket/data/orc_files/',
format => 'ORC',
columns => DESCRIPTOR(
"customer_id" BIGINT,
"customer_name" VARCHAR,
"order_date" DATE
)
)
)