Starburst Galaxy supports the following object storage table formats, which are each described on separate pages:
In Starburst Galaxy, you do not connect to a table format’s catalog. Instead, you connect to a storage location, where Galaxy then reads from, and can write to any of the supported table formats listed above, as found in the same location. This feature is called Great Lakes connectivity.
For data consumers who want to read from any object storage file in any supported table format, there are no configuration steps or commands. Starburst Galaxy reads each file’s type and format from the object store’s associated metadata.
To check the format of a table, use the SHOW CREATE
TABLE SQL
statement and refer to the type
property in the query result.
For data engineers who want to create new tables in an object storage catalog, Starburst Galaxy provides a simple way to specify the default format for all newly created tables.
When creating a new catalog, select from one of the options in the Default table format dialog. Your selection applies only to newly created tables, and does not convert any existing tables.
You can also use a SQL statement to specify an individual table’s format when
creating that table. With a statement such as CREATE TABLE
or
CREATE TABLE AS
, specify the type
parameter with iceberg
, delta
, or
hive
as its argument. A table created this way can override the catalog’s
overall default table format. Examples are shown in the individual table format
pages listed above.
Each table format has its own set of properties that are supported by Great
Lakes connectivity as part of CREATE TABLE
statements.
The following query lists the Starburst Galaxy table properties supported by Great Lakes connectivity:
SELECT * FROM system.metadata.table_properties;
Read more about the available table properties for each table format:
The SQL support with a catalog using Great Lakes connectivity depends partly on the table format in use, and on the format set as default for the catalog. The following additional details apply:
More specifics are available in the following sections about the different table formats, and their SQL support.
The CREATE VIEW
statement creates a new view based on a
SELECT query. The view is a logical table
that can be referenced by future queries. Views do not contain any data.
Instead, the query stored by the view is executed every time the view is
referenced by another query.
Views in Starburst Galaxy can be used to pull together federated data across any catalog. Read more about federating multiple data sources in Starburst Galaxy.
The use of an OR REPLACE
clause is optional and causes the view to be replaced
if it already exists, rather than raising an error.
The following examples show how to create a view:
Create a view:
CREATE VIEW view_name AS
SELECT col1, col2, col3
FROM table_name;
Create a view with a view comment:
CREATE VIEW view_with_comment
COMMENT 'This is a comment'
AS
SELECT col1, col2, col3
FROM table_name;
Create a view that summarizes the table_name
table:
CREATE VIEW view_summary AS
SELECT col2, sum(col3) AS col_total
FROM table_name
GROUP BY col2;
Create a view replacing an existing view:
CREATE OR REPLACE VIEW view_name AS
SELECT col1, col2, col3
FROM table_name
In addition, Great Lakes connectivity also supports the following SQL statements for views:
There are two different security modes for table views, DEFINER
and INVOKER
.
The default security mode is DEFINER
.
CREATE VIEW view_name
COMMENT 'this is a comment'
SECURITY DEFINER
AS
SELECT col1, col2
FROM table_name;
The DEFINER
security mode allows the ability to restrict access to the
underlying tables that the user may not be allowed to access. Tables referenced
in the view are accessed using the permissions of the view owner (the creator or
definer of the view) rather than the user executing the query.
A view created in INVOKER
security mode is a stored query. Tables referenced
in the view are accessed using the permissions of the user executing the query
(the invoker of the view).
Create and validate the definition of a new materialized view view_name
of a
SELECT query. Run the REFRESH
MATERIALIZED
VIEW statement
after the creation to repopulate the materialized view with data. This
materialized view is a physical representation of the query results at the time
of refresh. The data is stored and can be referenced by future queries.
If the materialized view is made from an Iceberg table, it only gets refreshed if there are new snapshots in the underlying tables.
Create a simple materialized view view_name
over the table_name
that
only includes column_data_1
:
CREATE MATERIALIZED VIEW catalog_name.schema_name.view_name
AS
SELECT column_data_1 FROM catalog_name.schema_name.table_name;
Read about the optional clauses and view additional usage examples for the CREATE MATERIALIZED VIEW SQL statement.
In addition, Great Lakes connectivity also supports the following SQL statements for materialized views:
The CREATE SCHEMA SQL statement creates a new, empty schema. A schema is a container that holds tables, views, and other database objects.
Create a new schema by using the default directory for the catalog or an
explicit LOCATION
. The location of the schema must be retrievable through the
SHOW CREATE SCHEMA
statement. In addition, your active role
set
must have the location
privilege
for the intended location.
Assign location privileges by following these steps:
The following SQL statements are examples of creating a schema using the default
directory location and using an explicit LOCATION
:
Create a new schema schema_name
in the default location:
CREATE SCHEMA schema_name;
Create a new schema schema_name_location
, with a set LOCATION
to
s3://bucket-name/lakehouse/data/schema_name_location
:
CREATE SCHEMA schema_name_location
WITH (LOCATION = 's3://bucket-name/lakehouse/data/schema_name_location');
Create a new schema if it does not already exist.
CREATE SCHEMA IF NOT EXISTS schema_name;
Create a new schema schema_name
, set the LOCATION
property to
s3://bucket-name/lakehouse/data/schema_name_location
and set the owner to
user_name
:
CREATE SCHEMA schema_name AUTHORIZATION user_name
WITH (LOCATION = 's3://bucket-name/lakehouse/data/schema_name_location');
View additional usage and clause examples for the CREATE SCHEMA SQL statement. In addition, the Great Lakes connectivity also supports the following SQL statements for schemas:
The DROP SCHEMA SQL statement drops an existing schema that is empty.
Drop an empty schema:
DROP SCHEMA schema_name;
Use the CASCADE
clause to drop all objects within a schema. For example, this
will drop tables, views, and materialized views.
Drop a non empty schema:
DROP SCHEMA schema_name CASCADE;
The UNLOAD
SQL statement is a pre-built table
function within the system
schema that writes files directly to storage. The files that UNLOAD
writes to
storage corresponds to what the input
parameter selects.
SELECT * FROM TABLE(system.unload(
input => TABLE(...) [PARTITION BY col (, ...)],
location => '',
format => ''
[, compression => '']
[, separator => '']
[, header => true|false]
))
input
, location
, and format
parameters
are required. The compression
, separator
, and header
parameters are optional. The input
parameter can accept either a table name or a SELECT
query. The
separator
and header
parameters are applicable only when the format
argument is set to CSV
or TEXTFILE
. See File formats
for more details.
Roles do not have access to the UNLOAD
function by default. To enable role
access, users must have the UNLOAD
function
privilege
on the catalog, as well as the corresponding location
privilege
on the object storage location where the output is written.
To unload the orderkey
, custkey
, and orderstatus
columns from the orders
table to the location s3://mybucket/my/unload/location
in ORC
file format:
SELECT
*
FROM
TABLE (
system.unload (
input => TABLE (
SELECT
orderkey,
custkey,
orderstatus
FROM
tpch.sf1.orders
)
PARTITION BY
(orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'ORC'
)
)
An example of UNLOAD
using input => TABLE(tpch.sf1.orders)
:
SELECT
*
FROM
TABLE (
system.unload (
input => TABLE (tpch.sf1.orders)
PARTITION BY
(orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'ORC'
)
)
An example of UNLOAD
with multiple partitions:
SELECT
*
FROM
TABLE (
system.unload (
input => TABLE (
SELECT
orderkey,
custkey,
orderdate,
orderstatus
FROM
tpch.sf1.orders
)
PARTITION BY
(orderdate, orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'TEXTFILE',
compression => 'GZIP',
separator => '|'
)
)
An example of UNLOAD
with TEXTFILE
format, a separator, and no compression:
SELECT
*
FROM
TABLE (
system.unload (
input => TABLE (
SELECT
orderkey,
custkey,
orderstatus
FROM
tpch.sf1.orders
)
PARTITION BY
(orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'TEXTFILE',
separator => '|'
)
)
An example of UNLOAD
with TEXTFILE
format, a separator, compression, and
header:
SELECT
*
FROM
TABLE (
system.unload (
input => TABLE (
SELECT
orderkey,
custkey,
orderstatus
FROM
tpch.sf1.orders
)
PARTITION BY
(orderstatus),
location => 's3://mybucket/my/unload/location',
format => 'TEXTFILE',
compression => 'GZIP',
separator => '|',
header => true,
)
)
Supported format
parameters:
Supported compression
parameters:
Each format has its own set of constraints. The CSV
format exclusively
supports VARCHAR
columns, and AVRO
files do not permit special characters in
the column names.
See shared session properties.
Is the information on this page helpful?
Yes
No