Great Lakes connectivity #
Great Lakes connectivity abstracts the details of using different table formats and file types when using certain write access statements for object storage systems. This connectivity is built into Starburst Galaxy, and is available to all users.
The following object storage catalogs support Great Lakes connectivity:
This feature supports the following table formats described on separate pages:
Any access to existing tables works transparently for all table formats. Starburst Galaxy recognizes the format of your tables by reading the metastore associated with your object storage.
Great Lakes SQL support #
The SQL support with a catalog using Great Lakes connectivity depends partially on the table format in use, and set as default per catalog. The following additional details apply:
- Globally available statements are fully supported.
- Read operations are are fully supported.
- Data management support varies across table formats.
- Schema and table management support varies across table formats.
- Catalogs can use different table formats for different tables. This results in different SQL support and different metadata and storage options for different tables. See the individual table format pages listed above for details.
- View management is fully supported.
- Materialized view management automatically uses the Iceberg table format. This behavior is independent of the default table format of the catalog.
More specifics are available in the following sections about the different table formats, and their SQL support.
Specify default table format #
For data consumers who want to read from any object storage file in any
supported table format, there are no configuration steps or commands.
Starburst Galaxy reads each file’s type and format from the object store’s
associated metadata. To check the format of a table, use the SHOW CREATE
TABLE SQL statement and refer to the
property in the query result.
For data engineers who want to create new tables in an object storage catalog, Starburst Galaxy provides a simple way to specify the default format for all newly created tables. When creating a new catalog, select from one of the options in the Default table format dialog. Your selection applies only to newly created tables, and does not convert any existing tables.
You can also use a SQL statement to specify an individual table’s format
when creating that table. With a statement such as
CREATE TABLE or
TABLE AS, specify the
type parameter with
its argument. A table created this way can override the catalog’s overall
default table format. Examples are shown in the individual table format pages
Table properties #
Each table format has its own set of properties that are supported by Great
Lakes connectivity as part of
CREATE TABLE statements.
The following query lists the Starburst Galaxy table properties supported by Great Lakes connectivity:
SELECT * FROM system.metadata.table_properties;
Read more about the available table properties for each table format:
CREATE VIEW #
CREATE VIEW statement creates a new view based on a
SELECT query. The view is a logical table that can be
referenced by future queries. Views do not contain any data. Instead, the query
stored by the view is executed every time the view is referenced by another
Views in Starburst Galaxy can be used to pull together federated data across any catalog. Read more about federating multiple data sources in Starburst Galaxy.
The use of an
OR REPLACE clause is optional and causes the view to be replaced
if it already exists, rather than raising an error.
The following examples show how to create a view:
Create a view:
CREATE VIEW view_name AS SELECT col1, col2, col3 FROM table_name;
Create a view with a view comment:
CREATE VIEW view_with_comment COMMENT 'This is a comment' AS SELECT col1, col2, col3 FROM table_name;
Create a view that summarizes the
CREATE VIEW view_summary AS SELECT col2, sum(col3) AS col_total FROM table_name GROUP BY col2;
Create a view replacing an existing view:
CREATE OR REPLACE VIEW view_name AS SELECT col1, col2, col3 FROM table_name
In addition, Great Lakes connectivity also supports the following SQL statements for views:
Security for views #
There are two different security modes for table views,
The default security mode is
CREATE VIEW view_name COMMENT 'this is a comment' SECURITY DEFINER AS SELECT col1, col2 FROM table_name;
DEFINER security mode allows the ability to restrict access to the
underlying tables that the user may not be allowed to access. Tables
referenced in the view are accessed using the permissions of the view owner (the
creator or definer of the view) rather than the user executing the query.
A view created in
INVOKER security mode is a stored query. Tables referenced
in the view are accessed using the permissions of the user executing the query
(the invoker of the view).
CREATE MATERIALIZED VIEW #
Create and validate the definition of a new materialized view
view_name of a
SELECT query. Run the REFRESH MATERIALIZED
VIEW statement after the creation to
repopulate the materialized view with data. This materialized view is a physical
representation of the query results at the time of refresh. The data is stored
and can be referenced by future queries.
If the materialized view is made from an Iceberg table, it only gets refreshed if there are new snapshots in the underlying tables.
Create a simple materialized view
view_name over the
CREATE MATERIALIZED VIEW catalog_name.schema_name.view_name AS SELECT column_data_1 FROM catalog_name.schema_name.table_name;
Read about the optional clauses and view additional usage examples for the CREATE MATERIALIZED VIEW SQL statement.
In addition, Great Lakes connectivity also supports the following SQL statements for materialized views:
CREATE SCHEMA #
The CREATE SCHEMA SQL statement creates a new, empty schema. A schema is a container that holds tables, views, and other database objects.
Create a new schema by using the default directory for the catalog or an
LOCATION. The location of the schema must be retrievable through the
SHOW CREATE SCHEMA statement. In addition,
your active role set must have the
Location privilege for the
Assign location privileges by following these steps:
- Click Access control > Roles and privileges.
- Click the name of the role you want to change.
- Click Privileges > Add privilege > Location.
- In the Storage location name text field, enter the path of the location you want to use.
The following SQL statements are examples of creating a schema using the default
directory location and using an explicit
Create a new schema
schema_name in the default location:
CREATE SCHEMA schema_name;
Create a new schema
schema_name_location, with a set
CREATE SCHEMA schema_name_location WITH (LOCATION = 's3://bucket-name/lakehouse/data/schema_name_location');
Create a new schema if it does not already exist.
CREATE SCHEMA IF NOT EXISTS schema_name;
Create a new schema
schema_name, set the
LOCATION property to
s3://bucket-name/lakehouse/data/schema_name_location and set the owner to
CREATE SCHEMA schema_name AUTHORIZATION user_name WITH (LOCATION = 's3://bucket-name/lakehouse/data/schema_name_location');
View additional usage and clause examples for the CREATE SCHEMA SQL statement. In addition, the Great Lakes connectivity also supports the following SQL statements for schemas:
DROP SCHEMA #
The DROP SCHEMA SQL statement drops an existing schema that is empty.
Drop an empty schema:
DROP SCHEMA schema_name;
CASCADE clause to drop all objects within a schema. For example, this
will drop tables, views, and materialized views.
Drop a non empty schema:
DROP SCHEMA schema_name CASCADE;
Session properties #
A session property temporarily modifies the
runtime environment for the duration of the current connection session to the
cluster. To modify the property, use the
SET SESSION statement followed by the
property name and a property-specific expression argument.
SET SESSION property_name = expression;
Session properties supported by a catalog can be set on a per-catalog basis for the current session. A catalog session property supported by more than one catalog can be set differently for each such catalog in the same session. To restrict a session property for use by a single property, prepend the catalog name to the property name:
SET SESSION catalog_name.property_name = expression;
Once set, session properties are in place only to the current connection session. You can have multiple connections to a cluster that each have a different combination of session properties. Once a session ends, either by disconnecting or creating a new session, any changes made to session properties during the previous session are lost.
Use the RESET SESSION statement to clear the current session back to Starburst Galaxy defaults. Use the SHOW SESSION statement to view all currently available session properties. For additional information, read about the SET SESSION statement.
Shared and unique session properties #
With Great Lakes connectivity, object storage table formats share many session properties across Iceberg, Delta Lake, and Hive formats.
In addition, each table format has a set of session properties uniquely supported that format. Read more about the available session properties for individual table formats:
- Iceberg session properties
- Delta Lake session properties
- Hive session properties
- Hudi session properties
Is the information on this page helpful?