Starburst Galaxy

  • Starburst Galaxy Home
  •   Get started
  •   Global features
  • Help center
  • Release notes
  • Feature release types

  • Starburst Galaxy UI
  •   Query
  •   Catalogs
  •   Catalog explorer
  •   Data products
  •   Clusters
  • Partner connect
  •   Admin
  •   Access control
  •   Cloud settings

  • Administration
  •   Security
  •   Single sign-on
  •   Troubleshooting
  • Galaxy status

  • Reference
  •   Python
  • API
  •   SQL
  •   Tutorials
  • Iceberg table format #

    Great Lakes connectivity abstracts the details of using different table formats and file types when using object storage catalogs.

    This page describes the features specific to the Iceberg table format when used with Great Lakes connectivity.

    Specify Iceberg format #

    Apache Iceberg is an open table format for huge analytic datasets. Starburst Galaxy allows querying data stored in files written in the Iceberg format, as defined in the Iceberg Table Spec. Iceberg tables are automatically detected and read based on information in the storage environment’s associated metadata.

    For an object storage catalog that specifies Iceberg as its default table format, no special syntax is required. A simple CREATE TABLE statement creates an Iceberg format table.

    To create an Iceberg table when the default format is not Iceberg, add type='iceberg' as a table property in your CREATE statement. For example:

    CREATE TABLE customer (
        name varchar,
        address varchar)
    WITH (type='iceberg');
    

    Specify the file format of table data files with an additional format property with value either parquet or orc, defaulting to orc:

    ...
    WITH (type='iceberg',
          format='parquet');
    

    Metadata tables #

    Great Lakes connectivity exposes several metadata tables for the Iceberg table format. These metadata tables contain information about the internal structure of the Iceberg table. Query each metadata table by appending the metadata table name to the table_name with a $ separator:

    SELECT * FROM catalog_name.schema_name."table_name$properties";
    

    $properties #

    The $properties table provides access to general information about the Iceberg table configuration and any additional metadata key-value tags.

    Retrieve the snapshot of the Iceberg table table_name by using the following query statement:

    SELECT * FROM catalog_name.schema_name."table_name$properties";
    
     key                   | value    |
    -----------------------+----------+
    write.format.default   | PARQUET  |
    

    $history #

    The $history table provides a log of the metadata changes made to the table.

    Retrieve the changelog of the Delta Lake table table_name by using the following query statement:

    SELECT * FROM catalog_name.schema_name."table_name$history";
    

    The following table describes the table columns of the $history table query output:

    $history columns
    Name Type Description
    made_current_at TIMESTAMP(3) WITH TIME ZONE The time when the snapshot became active.
    snapshot_id BIGINT The identifier of the snapshot.
    parent_id BIGINT The identifier of the parent snapshot.
    is_current_ancestor BOOLEAN Whether or not this snapshot is an ancestor of the current snapshot.

    $snapshots #

    The $snapshots table provides a detailed view of snapshots of the Iceberg table. A snapshot consists of one or more file manifests, and the complete table contents are represented by the union of all the data files in those manifests.

    Retrieve the snapshot information of the Iceberg table table_name by using the following query statement:

    SELECT * FROM catalog_name.schema_name."table_name$snapshots";
    

    The following table describes the table columns of the $snapshot table query output:

    $snapshots columns
    Name Type Description
    committed_at TIMESTAMP(3) WITH TIME ZONE The time when the snapshot became active.
    snapshot_id BIGINT The identifier of the snapshot.
    parent_id BIGINT The identifier of the parent snapshot.
    operation VARCHAR The type of operation performed on the Iceberg table. The supported operation types in Iceberg are append, replace, overwrite, and delete.
    manifest_list VARCHAR The list of Avro manifest files containing the detailed information about the snapshot changes.
    summary map(VARCHAR, VARCHAR) A summary of the changes made from the previous snapshot to the current snapshot.

    $manifests #

    The $manifests table provides a detailed overview of the manifests corresponding to the snapshots performed in the log of the Iceberg table.

    Retrieve the manifest information of the Iceberg table table_name by using the following query statement:

    SELECT * FROM catalog_name.schema_name."table_name$manifests";
    

    The following table describes the table columns of the $manifests table query output:

    $manifests columns
    Name Type Description
    path VARCHAR The manifest file location.
    length BIGINT The manifest file length.
    partition_spec_id INTEGER The identifier for the partition specification used to write the manifest file.
    added_snapshot_id BIGINT The identifier of the snapshot during which this manifest entry has been added.
    added_data_files_count INTEGER The number of data files with status ADDED in the manifest file.
    added_rows_count BIGINT The total number of rows in all data files with status ADDED in the manifest file.
    existing_data_files_count INTEGER The number of data files with status EXISTING in the manifest file.
    existing_rows_count BIGINT The total number of rows in all data files with status EXISTING in the manifest file.
    deleted_data_files_count INTEGER The number of data files with status DELETED in the manifest file.
    deleted_rows_count BIGINT The total number of rows in all data files with status DELETED in the manifest file.
    partitions ARRAY(row(contains_null BOOLEAN, contains_nan BOOLEAN, lower_bound VARCHAR, upper_bound VARCHAR)) Partition range metadata.

    $partitions #

    The $partitions table provides a detailed overview of the partitions of the Iceberg table.

    Retrieve the partitions information of the Iceberg table table_name by using the following query statement:

    SELECT * FROM catalog_name.schema_name."table_name$partitions";
    

    The following table describes the table columns of the $partitions table query output:

    $partitions columns
    Name Type Description
    partition ROW(...) A row that contains the mapping of the partition column names to the partition column values.
    record_count BIGINT The number of records in the partition.
    file_count BIGINT The number of files mapped in the partition.
    total_size BIGINT The size of all the files in the partition.
    data ROW(... ROW (min ..., max ... , null_count BIGINT, nan_count BIGINT)) Partition range metadata.

    $files #

    The $files table provides a detailed overview of the data files in current snapshot of the Iceberg table.

    Retrieve the files information of the Iceberg table table_name by using the following query statement:

    SELECT * FROM catalog_name.schema_name."table_name$files";
    

    The following table describes the table columns of the $files table query output:

    $files columns
    Name Type Description
    content INTEGER Type of content stored in the file. The supported content types in Iceberg are DATA(0), POSITION_DELETES(1), and EQUALITY_DELETES(2).
    file_path VARCHAR Type of the reference BRANCH or TAG.
    file_format VARCHAR The format of the data file.
    record_count BIGINT The number of entries contained in the data file.
    file_size_in_bytes BIGINT The data file size.
    column_sizes map(INTEGER, BIGINT) Mapping between the Iceberg column ID and its corresponding size in the file.
    value_counts map(INTEGER, BIGINT) Mapping between the Iceberg column ID and its corresponding count of entries in the file.
    null_value_counts map(INTEGER, BIGINT) Mapping between the Iceberg column ID and its corresponding count of NULL values in the file.
    lower_bounds map(INTEGER, BIGINT) Mapping between the Iceberg column ID and its corresponding lower bound in the file.
    upper_bounds map(INTEGER, BIGINT) Mapping between the Iceberg column ID and its corresponding upper bound in the file.
    key_metadata VARBINARY Metadata about the encryption key used to encrypt the file, if applicable.
    split_offsets array(BIGINT) List of recommended slit locations.
    equality_ids array(INTEGER) The set of field IDs used for equality comparison in equality delete files.

    $refs #

    The $refs table provides information about Iceberg references including branches and tags.

    Retrieve the refs information of the Iceberg table table_name by using the following query statement:

    SELECT * FROM catalog_name.schema_name."table_name$refs";
    

    The following table describes the table columns of the $refs table query output:

    $refs columns
    Name Type Description
    name VARCHAR Name of the reference.
    type VARCHAR Type of the reference BRANCH or TAG.
    snapshot_id BIGINT The snapshot ID of the reference.
    max_reference_age_in_ms BIGINT The maximum age of the reference before it could be expired.
    min_snapshots_to_keep INTEGER For branch only, the minimum number of snapshots to keep in a branch.
    max_snapshot_age_in_ms BIGINT For branch only, the max snapshot age allowed in a branch. Older snapshots in the branch will be expired.

    Table procedures #

    Use the CALL statement to perform data manipulation or administrative tasks. Procedures must include a qualified catalog name. For example, for a catalog named examplecatalog:

    CALL examplecatalog.system.example_procedure();
    

    The following Iceberg table procedures are available:

    register_table #

    system.register_table('schema_name', 'table_name','s3://bucket-name/lakehouse/data/table_name_location');
    

    Allows the caller to register an existing Iceberg table in the metastore, using its existing metadata and data files.

    The table_location must be a complete URI, as the previous example shows.

    Provide a file name to register a table with specific metadata. In addition, this procedure can be used to register the table with a specific table state, or may be necessary if Starburst Galaxy cannot automatically determine which metadata version to use:

    system.register_table('schema_name', 'table_name', 's3://bucket-name/lakehouse/data/table_name_location'),'00003-409702ba-4735-4645-8f14-09537cc0b2c8.metadata.json');
    

    unregister_table #

    system.unregister_table('schema_name', 'table_name');
    

    Allows the caller to unregister an existing Iceberg table from the metastores without deleting the data.

    rollback_to_snapshot #

    system.rollback_to_snapshot('schema_name', 'table_name', 8954597067493422955)
    

    Allows the caller to roll back the state of the table to a previous snapshot ID.

    ALTER TABLE EXECUTE #

    The ALTER TABLE EXECUTE statement followed by a command and parameters modifies the table according to the specified procedure and parameters. For Iceberg tables, ALTER TABLE EXECUTE supports the following commands:

    Use the => operator for passing named parameter values. The left side is the name of the parameter and the right side is the value being passed:

    ALTER TABLE catalog_name.schema_name.table_name EXECUTE optimize(file_size_threshold => '10MB');
    

    optimize #

    The optimize procedure improves read performance by rewriting the content of a specific table to reduce the number of files. This process increases file size. If the table is partitioned, the data compaction acts separately on each partition selected for optimization.

    All files with a size below the optional file_size_threshold parameter are merged. The default value is 100MB.

    ALTER TABLE catalog_name.schema_name.table_name EXECUTE optimize;
    

    The following statement merges files in a table that are under the 10MB file size threshold:

    ALTER TABLE catalog_name.schema_name.table_name EXECUTE optimize(file_size_threshold => '10MB');
    

    Optimize specific partitions using a WHERE clause with the columns used as partition_key value.

    ALTER TABLE partitioned_table_name EXECUTE optimize
    WHERE partition_key = 1;
    

    drop_extended_stats #

    ALTER TABLE catalog_name.schema_name.table_name EXECUTE drop_extended_stats;
    

    Removes all extended statistics information from the table.

    expire_snapshots #

    The expire_snapshots procedure removes all snapshots and all related metadata and data files. Regularly expiring snapshots is recommended to delete data files that are no longer needed, and to keep the size of table metadata small. The procedure affects all snapshots that are older than the time period configured with the retention_threshold parameter. The following example changes the threshold to a value of nine days:

    ALTER TABLE catalog_name.schema_name.table_name EXECUTE expire_snapshots(retention_threshold => '9d');
    

    The value for retention_threshold must be higher than or equal to the default value of 7d. Otherwise, the procedure fails with a message similar to:

    Retention specified (1d) is shorter than the minimum retention configured in the system (7d). The default value is 7d.

    remove_orphan_files #

    The remove_orphan_files procedure removes all files from a table’s data directory that are not associated with metadata files. It also removes files that are older than the value of the retention_threshold parameter. Deleting orphaned files periodically is recommended to keep the size of a table’s data directory under control. The following example changes the retention_threshold to nine days:

    ALTER TABLE catalog_name.schema_name.table_name EXECUTE remove_orphan_files(retention_threshold => '9d');
    

    The value for retention_threshold must be higher than or equal to the default value of 7d. Otherwise, the procedure fails with a message similar to:

    Retention specified (1d) is shorter than the minimum retention configured in the system (7d). The default value is 7d.

    Migrate Hive to Iceberg #

    Migration from Hive to Iceberg table format is supported.

    The migration process uses the ALTER TABLE SET PROPERTIES syntax. See more information on the Hive table format page.

    Session properties #

    A session property temporarily modifies an internal Starburst Galaxy setting for the duration of the current connection session to the cluster. Use the SET SESSION statement followed by the property name, then a value such as true or false to modify the property:

    SET SESSION catalog_name.session_property = expression;
    

    Use the SHOW SESSION statement to view all current session properties. For additional information, read about the SET SESSION, and RESET SESSION SQL statements.

    Catalog session properties are internal session properties that can be set on a per-catalog basis. These properties must be set separately for each catalog by including the catalog name before the property name, for example, catalog_name.property_name.

    Session properties are linked to the current session, which lets you have multiple connections to a cluster that each have different values for the same session properties. Once a session ends, either by disconnecting or creating a new session, any changes made to session properties during the previous session are lost.

    The following sections describe the session properties supported by Iceberg table type:

    collect_extended_statistics_on_write #

    SET SESSION catalog_name.collect_extended_statistics_on_write = true;
    

    Collect extended statistics while writing files. The default value is true.

    compression_codec #

    SET SESSION catalog_name.compression_codec = 'DEFAULT';
    

    The compression codec is used when writing new data files. The possible values are:

    • NONE
    • DEFAULT
    • SNAPPY
    • LZ4
    • ZSTD
    • GZIP

    The default value is DEFAULT.

    dynamic_filtering_wait_timeout #

    SET SESSION catalog_name.dynamic_filtering_wait_timeout = '10s';
    

    The duration to wait for completion of dynamic filtering during split generation. Dynamic filtering optimizations significantly improve the performance of queries with selective joins by avoiding reading of data that would be filtered by join conditions.

    This extra wait time can potentially result in significant overall savings in query and CPU time, if dynamic filtering is able to reduce the amount of scanned data. In the previous example, the dynamic_filtering_wait_timeout is set to 10 seconds.

    expire_snapshots_min_retention #

    SET SESSION catalog_name.expire_snapshots_min_retention = '7d';
    

    The minimum retention period for the expire_snapshot procedure. The default value is 7d.

    extended_statistics_enabled #

    SET SESSION catalog_name.extended_statistics_enabled = true;
    

    Enables statistics collection with the ANALYZE statement and the use of extended statistics. The default value is true. The statistics_enabled session property must be set to true. Otherwise, the collection of any statistics is disabled.

    Use the ANALYZE statement to populate data size and number of distinct values (NDV) extended table statistics in Iceberg. The minimum value, maximum value, value count, and null value count statistics are computed on the fly out of the transaction log of the Iceberg table.

    merge_manifests_on_write #

    SET SESSION catalog_name.merge_manifests_on_write = true;
    

    Compact the manifest files when performing write operations. The default value is true.

    minimum_assigned_split_weight #

    SET SESSION catalog_name.minimum_assigned_split_weight = 0.05;
    

    The minimum assigned split weight when size based split weighting is enabled. The default value is 0.05.

    orc_bloom_filters_enabled #

    SET SESSION catalog_name.orc_bloom_filters_enabled = `false`;
    

    Enable bloom filters for predicate pushdown. The default value is false.

    orc_lazy_read_small_ranges #

    SET SESSION catalog_name.orc_lazy_read_small_ranges = true;
    

    Read small file segments lazily. The default value is true.

    orc_max_buffer_size #

    SET SESSION catalog_name.orc_max_buffer_size = '8MB';
    

    Maximum size of a single read. The default value is 8MB.

    orc_max_merge_distance #

    SET SESSION catalog_name.orc_max_merge_distance = '1MB';
    

    Maximum size of the gap between two reads to merge into a single read. The default value is 1MB.

    orc_max_read_block_size #

    SET SESSION catalog_name.orc_max_read_block_size = '16MB';
    

    Soft maximum size of Trino blocks produced by the ORC reader. The default value is 16MB.

    orc_native_zstd_decompressor_enabled #

    SET SESSION catalog_name.orc_native_zstd_decompressor_enabled = true;
    

    Enable using native zstd library for faster decompression of ORC files. The default value is true.

    orc_nested_lazy_enabled #

    SET SESSION catalog_name.orc_nested_lazy_enabled = true;
    

    Lazily read nested data. The default value is true.

    orc_stream_buffer_size #

    SET SESSION catalog_name.orc_stream_buffer_size = '8MB';
    

    Size of buffer for streaming reads. The default value is 8MB.

    orc_string_statistics_limit #

    SET SESSION catalog_name.orc_string_statistics_limit = '64B';
    

    Maximum size of the string statistics. The default value is 64B.

    orc_tiny_stripe_threshold #

    SET SESSION catalog_name.orc_tiny_stripe_threshold = '8MB';
    

    The threshold below which an ORC stripe or file will read in its entirety. The default value is 8MB.

    orc_writer_max_dictionary_memory #

    SET SESSION catalog_name.orc_writer_max_dictionary_memory = '16MB';
    

    The maximum dictionary memory. The default value is 16MB.

    orc_writer_max_stripe_rows #

    SET SESSION catalog_name.orc_writer_max_stripe_rows = 10000000;
    

    The maximum stripe row count. The default value is 10000000.

    orc_writer_max_stripe_size #

    SET SESSION catalog_name.orc_writer_max_stripe_size = '64MB';
    

    The maximum stripe size. The default is 64MB.

    orc_writer_min_stripe_size #

    SET SESSION catalog_name.orc_writer_min_stripe_size = '32MB';
    

    The minimum stripe size. The default is 32MB.

    orc_writer_validate_mode #

    SET SESSION catalog_name.orc_writer_validate_mode = 'BOTH';
    

    The level of detail in ORC validation. The possible values are:

    • HASHED
    • DETAILED
    • BOTH

    The default value is BOTH.

    orc_writer_validate_percentage #

    SET SESSION catalog_name.orc_writer_validate_percentage = 0;
    

    The percentage of written files to validate by rereading them. The default value is 0.

    parquet_max_read_block_row_count #

    SET SESSION catalog_name.parquet_max_read_block_row_count = 8192;
    

    Sets the maximum number of rows read in a batch. The default value is 8192.

    parquet_max_read_block_size #

    SET SESSION catalog_name.parquet_max_read_block_size = '16MB';
    

    The maximum block size used when reading Parquet files. The default value is 16MB.

    parquet_native_snappy_decompressor_enabled #

    SET SESSION catalog_name.parquet_native_snappy_decompressor_enabled = true;
    

    Enable using native SNAPPY library for faster decompression of Parquet files. The default value is true;

    parquet_optimized_reader_enabled #

    SET SESSION catalog_name.parquet_optimized_reader_enabled = true;
    

    Specifies whether batched column readers are used when reading Parquet files for improved performance. Set this property to false to disable the optimized Parquet reader The default value is true.

    parquet_native_zstd_decompressor_enabled #

    SET SESSION catalog_name.parquet_native_zstd_decompressor_enabled = true;
    

    Enable using native ZSTD library for faster decompression of Parquet files. The default value is true.

    parquet_optimized_nested_reader_enabled #

    SET SESSION catalog_name.parquet_optimized_nested_reader_enabled = true;
    

    Specifies whether batched column readers are used when reading ARRAY, MAP, and ROW types from Parquet files for improved performance. Set this property to false to disable the optimized Parquet reader for structural data types. The default value is true.

    parquet_use_bloom_filter #

    SET SESSION catalog_name.parquet_use_bloom_filter = true;
    

    Specifies whether Bloom filters are used for predicate pushdown when reading Parquet files. The default value is true;

    parquet_vectorized_decoding_enabled #

    SET SESSION catalog_name.parquet_use_column_names = true;
    

    Enable the use of the Java Vector API for faster decoding of Parquet files. The default value is true.

    parquet_writer_batch_size #

    SET SESSION catalog_name.parquet_writer_batch_size = 10000;
    

    Maximum number of rows processed by the Parquet writer in a batch. The default value is 10000.

    parquet_writer_block_size #

    SET SESSION catalog_name.parquet_writer_block_size = '128MB';
    

    The maximum block size created by the Parquet writer. The default value is 128MB.

    parquet_writer_page_size #

    SET SESSION catalog_name.parquet_writer_page_size = '1MB';
    

    The maximum page size created by the Parquet writer. The default value is 1MB.

    projection_pushdown_enabled #

    SET SESSION catalog_name.projection_pushdown_enabled = true;
    

    Enable projection pushdown. The default value is true.

    remove_orphan_files_min_retention #

    SET SESSION catalog_name.remove_orphan_files_min_retention = '7d';
    

    The minimum retention for the files taken into account for removal by the remove_orphan procedure. The default value is 7d.

    sorted_writing_enabled #

    SET SESSION catalog_name.sorted_writing_enabled = true;
    

    Enable sorted writes. The default value is true.

    statistics_enabled #

    SET SESSION catalog_name.statistics_enabled = true;
    

    Enables table statistics for performance improvements. The default value is true. If the value is set to false the following session property is disabled:

    target_max_file_size #

    SET SESSION catalog_name.target_max_file_size = '1GB';
    

    Target maximum size of written files. The default value is 1GB.

    use_file_size_from_metadata #

    SET SESSION catalog_name.use_file_size_from_metadata = true;
    

    Use the file size stored in Iceberg metadata. The default value is true.

    Iceberg SQL support #

    When using the Iceberg table format with Great Lakes connectivity, the general SQL support details apply.