Storage functions#
Use storage functions to read from or write to external object storage locations directly within your SQL queries.
Storage functions support Amazon S3, Google Cloud Storage, Azure Storage, and local file systems.
Note
Storage functions are available as a public preview feature. Contact your Starburst account team with questions or feedback.
Configuration#
To use storage functions, you must configure file system access and credentials.
File system access#
You must enable and configure file system access. Storage functions support the following file systems:
Credentials#
Storage functions require credentials to authenticate with object storage
systems. You must specify exactly one of the following properties in
config.properties:
Property name |
Description |
|---|---|
|
Base64-encoded credential key |
|
Path to a JSON credential file |
Warning
You must specify either io.credentials-key or io.credentials-file, but not
both. If you set both properties or neither property, the system fails to start.
Credential file format#
The credential file uses JSON format to define storage configurations. The connector matches locations based on prefix comparison and selects the most specific match.
Note
The connector throws an exception if a specified location matches no configuration or matches multiple configurations. Ensure your location prefixes are specific enough to avoid conflicts.
{
"configurations": [
{
"id": "s3-us-east-1",
"location": "s3://my-bucket-east/",
"configuration": {
"fs.native-s3.enabled": "true",
"s3.aws-access-key": "YOUR_ACCESS_KEY",
"s3.aws-secret-key": "YOUR_SECRET_KEY",
"s3.region": "us-east-1"
}
},
{
"id": "gcs-bucket",
"location": "gs://my-gcs-bucket/",
"configuration": {
"fs.native-gcs.enabled": "true",
"gcs.json-key": "YOUR_JSON_KEY"
}
},
{
"id": "azure-container",
"location": "abfs://container@account.dfs.core.windows.net/",
"configuration": {
"fs.native-azure.enabled": "true",
"azure.auth-type": "ACCESS_KEY",
"azure.access-key": "YOUR_ACCESS_KEY"
}
}
]
}
To generate the base64-encoded key from a credential file, use the following command:
cat credentials.json | base64
starburst.io.unload#
The starburst.io.unload function is a table function that exports query
results directly to external storage locations.
Warning
Fault-tolerant clusters do not support the starburst.io.unload function.
Syntax#
SELECT * FROM TABLE(starburst.io.unload(
input => TABLE(...) [PARTITION BY col [, ...]],
location => 'storage_path',
format => 'file_format'
[, compression => 'compression_type']
[, separator => 'delimiter']
[, header => true|false]
)
)
Parameters#
The following parameters are required:
input: The source datalocation: The target storage pathformat: The output file format
The following parameters are optional:
compression: The compression algorithm to apply to files. Choose from the following options:NONE(default)SNAPPYLZ4ZSTDGZIP
separator: Field delimiter forCSVandTEXTFILEformatsheader: Boolean that specifies whether to include column headers in the output file when usingCSVandTEXTFILEformats. Defaults tofalse.
Supported formats#
The starburst.io.unload function supports the following file formats:
ORCPARQUETAVROCSVJSONTEXTFILERCBINARYRCTEXTSEQUENCEFILEOPENX_JSON
For more information, see Object storage file formats.
Examples#
The following example unloads the orderkey, custkey, and orderstatus
columns from the tpch.sf1.orders table to the
s3://mybucket/my/unload/location folder in ORC file format:
SELECT * FROM TABLE(
starburst.io.unload(
input => TABLE(
SELECT orderkey, custkey, orderstatus
FROM tpch.sf1.orders
) PARTITION BY (orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'ORC'
)
)
The following example unloads the entire tpch.sf1.orders table:
SELECT * FROM TABLE(
starburst.io.unload(
input => TABLE(tpch.sf1.orders)
PARTITION BY (orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'ORC'
)
)
The following example unloads the table into multiple partitions:
SELECT * FROM TABLE(
starburst.io.unload(
input => TABLE(
SELECT orderkey, custkey, orderdate, orderstatus
FROM tpch.sf1.orders
) PARTITION BY (orderdate, orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'TEXTFILE',
compression => 'GZIP',
separator => '|'
)
)
The following example unloads the table as a compressed, delimited text file with column headers:
SELECT * FROM TABLE(
starburst.io.unload(
input => TABLE(
SELECT orderkey, custkey, orderstatus
FROM tpch.sf1.orders
) PARTITION BY (orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'TEXTFILE',
compression => 'GZIP',
separator => '|',
header => true
)
)
starburst.io.load#
The starburst.io.load function is a table function that reads data from
external storage locations directly into queries.
Warning
Fault-tolerant clusters do not support the starburst.io.load function.
Syntax#
SELECT * FROM TABLE(starburst.io.load(
location => 'storage_path'
[, format => 'file_format']
[, columns => DESCRIPTOR(column_name type, ...)]
)
)
Parameters#
The following parameter is required:
location: The source storage path
The following parameters are optional:
format: The input file formatcolumns: Defines column names and types
Note
You must specify both the format and columns parameters together, or omit
both. When you omit these parameters, the starburst.io.load function
automatically discovers the format and schema.
Supported formats#
The starburst.io.load function supports the following file formats:
ORCPARQUETAVROCSVJSONTEXTFILERCBINARYRCTEXTSEQUENCEFILEOPENX_JSON
For more information, see Object storage file formats.
Examples#
The following example loads a Parquet file and uses automatic schema discovery:
SELECT * FROM TABLE(
starburst.io.load(
location => 's3://mybucket/data/'
)
)
The following example loads ORC files from a directory and defines an explicit schema:
SELECT * FROM TABLE(
starburst.io.load(
location => 's3://mybucket/data/orc_files/',
format => 'ORC',
columns => DESCRIPTOR(
"customer_id" BIGINT,
"customer_name" VARCHAR,
"order_date" DATE
)
)
)