Great Lakes connector#

The Great Lakes connector provides a unified way to interact with data stored in various table formats including Hive, Iceberg, Delta Lake, and Hudi that share the same storage system and metastore service. The connector acts as a proxy layer on top of the object store connectors, letting you query and write data across table formats from the same catalog.

Existing tables are automatically detected regardless of table format. Starburst Enterprise platform (SEP) determines the table format by reading the metadata from the configured metastore.

The connector includes most features and integrations supported in the underlying table formats. See the documentation for:

Note

The Great Lakes connector is available as a public preview in Starburst Enterprise. Contact your Starburst account with questions or feedback.

Requirements#

To use Great Lakes, you need:

Network access from the SEP coordinator and workers nodes to the object storage system.
Access to a Hive metastore service (HMS) 3.1.2 or later, an AWS Glue catalog, or Unity Catalog.
Data files stored in the file formats Parquet (default), ORC, or Avro on a supported file system. The Hive table type supports additional file formats.

Configuration#

To configure the Great Lakes connector, create a catalog properties file that specifies the Great Lakes connector by setting the connector.name to great_lakes.

By default, the connector uses a Hive metastore.

You must select and configure one of the supported file systems.

connector.name=great_lakes
hive.metastore=glue
fs.x.enabled=true

Replace the fs.x.enabled configuration property with the desired file system.

Each supported metastore type has specific configuration properties along with general metastore configuration properties.

The Great Lakes connector supports the following metastores:

THRIFT: Hive Thrift metastore (default)
GLUE: AWS Glue Catalog
UNITY: Databricks Unity Catalog

General configuration properties#

The following table describes catalog configuration properties for the connector:

Property name	Description	Default
`great-lakes.table-type`	The default table type for newly created tables. Possible values are: `HIVE`, `ICEBERG`, `DELTA`, or `HUDI`.	`HIVE`
`great-lakes.information-schema-queries-threads`	The number of threads to get table metadata in parallel.	`32`
`great-lakes.iceberg-default-file-format`	The default file format for Iceberg tables. Possible values are: `ORC`, `PARQUET` or `AVRO`.	`PARQUET`
`great-lakes.iceberg-rest-catalog-used`	Must enable when using a Unity Catalog as a metstore.	`false`

File system access configuration#

The connector supports accessing the following file systems:

SQL support#

This connector provides read and write access to data and metadata. The connector supports globally available and read operation statements, as well as statements supported by individual table formats.

Procedures#

The connector does not support the Hive flush_filesystem_cache procedure.

Table properties#

Each table format has its own set of table properties when used with the Great Lakes connector as part of CREATE TABLE statements. Create an Iceberg table with the Great Lakes connector by setting the table type to ICEBERG.

CREATE TABLE iceberg_table (
  c1 INTEGER,
  c2 DATE,
  c3 DOUBLE
)
WITH (
  type = 'ICEBERG'
  format = 'PARQUET',
  partitioning = ARRAY['c1', 'c2'],
  sorted_by = ARRAY['c3']
);

Read more about the available table properties for each table format:

Creating tables is not support for the Hudi connector.

View management#

The connector does not support the following view management features:

Table functions#

The connector provides specific table functions.

UNLOAD#

The connector supports the UNLOAD function, which exports query results directly to external storage locations.

Warning

Fault-tolerant clusters do not support the UNLOAD function.

SELECT * FROM TABLE(system.unload(
    input => TABLE(...) [PARTITION BY col [, ...]],
    location => 'storage_path',
    format => 'file_format'
    [, compression => 'compression_type']
    [, separator => 'delimiter']
    [, header => true|false]
    )
)

For more information about the UNLOAD table function, read the documentation.

table_changes#

The connector supports the table_changes table function, which reads row-level changes between two versions of a table.

SELECT
  *
FROM
  TABLE(
    system.table_changes(
      schema_name => 'test_schema',
      table_name => 'tableName',
      since_version => 0
    )
  );

For more information about the table_changes function, read the Iceberg and Delta Lake connectors documentation.

Session properties#

The Great Lakes connector supports a set of session properties. Use the SHOW SESSION statement to view all currently available session properties.

A session property temporarily modifies the runtime environment for the duration of the current connection.

To modify the property, use the SET SESSION statement followed by the property name and a property-specific expression argument.

SET SESSION property_name = expression;

Session properties supported by a catalog can be set on a per-catalog basis for the current session. A catalog session property supported by more than one catalog can be set differently for each such catalog in the same session. To restrict a session property for use by a single property, prepend the catalog name to the property name:

SET SESSION catalog_name.property_name = expression;

Session properties apply only to the current connection. You can have multiple connections to a cluster that each have a different combination of session properties. Once a session ends, either by disconnecting or creating a new session, any changes made to session properties during the previous session are lost.

Use the RESET SESSION statement to clear the current session back to SEP defaults. For additional information, read about the SET SESSION statement.

Security#

The connector does not support the following security features.

For more information about security, read the documentation.