Iceberg table format #
Great Lakes connectivity abstracts the details of using different table formats and file types when using object storage catalogs.
This page describes the features specific to the Iceberg table format when used with Great Lakes connectivity.
Specify Iceberg format #
Apache Iceberg is an open table format for huge analytic datasets. Starburst Galaxy allows querying data stored in files written in the Iceberg format, as defined in the Iceberg Table Spec. Iceberg tables are automatically detected and read based on information in the storage environment’s associated metadata.
For an object storage catalog that specifies Iceberg as its default table
format, no special syntax
is required. A simple CREATE TABLE
statement creates an Iceberg format table.
To create an Iceberg table when the default format is not Iceberg, add
type='iceberg'
as a table property in your CREATE
statement. For example:
CREATE TABLE customer (
name varchar,
address varchar)
WITH (type='iceberg');
Specify the file format of table data files with an additional format
property
with value either parquet
or orc
, defaulting to orc
:
...
WITH (type='iceberg',
format='parquet');
Metadata tables #
Great Lakes connectivity exposes several metadata tables
for the Iceberg table format. These metadata tables contain information about
the internal structure of the Iceberg table. Query each metadata table by
appending the metadata table name to the table_name
with a $
separator:
SELECT * FROM catalog_name.schema_name."table_name$properties";
$properties #
The $properties
table provides access to general information about the Iceberg
table configuration and any additional metadata key-value tags.
Retrieve the snapshot of the Iceberg table table_name
by using the
following query statement:
SELECT * FROM catalog_name.schema_name."table_name$properties";
key | value |
-----------------------+----------+
write.format.default | PARQUET |
$history #
The $history
table provides a log of the metadata changes made to the table.
Retrieve the changelog of the Delta Lake table table_name
by using the
following query statement:
SELECT * FROM catalog_name.schema_name."table_name$history";
The following table describes the table columns of the $history
table query
output:
Name | Type | Description |
---|---|---|
made_current_at |
TIMESTAMP(3) WITH TIME ZONE | The time when the snapshot became active. |
snapshot_id |
BIGINT | The identifier of the snapshot. |
parent_id |
BIGINT | The identifier of the parent snapshot. |
is_current_ancestor |
BOOLEAN | Whether or not this snapshot is an ancestor of the current snapshot. |
$snapshots #
The $snapshots
table provides a detailed view of snapshots of the Iceberg
table. A snapshot consists of one or more file manifests, and the complete table
contents are represented by the union of all the data files in those manifests.
Retrieve the snapshot information of the Iceberg table table_name
by using the
following query statement:
SELECT * FROM catalog_name.schema_name."table_name$snapshots";
The following table describes the table columns of the $snapshot
table query
output:
Name | Type | Description |
---|---|---|
committed_at |
TIMESTAMP(3) WITH TIME ZONE | The time when the snapshot became active. |
snapshot_id |
BIGINT | The identifier of the snapshot. |
parent_id |
BIGINT | The identifier of the parent snapshot. |
operation |
VARCHAR | The type of operation performed on the Iceberg table. The supported operation types in Iceberg are append, replace, overwrite, and delete. |
manifest_list |
VARCHAR | The list of Avro manifest files containing the detailed information about the snapshot changes. |
summary |
map(VARCHAR, VARCHAR) | A summary of the changes made from the previous snapshot to the current snapshot. |
$manifests #
The $manifests
table provides a detailed overview of the manifests
corresponding to the snapshots performed in the log of the Iceberg table.
Retrieve the manifest information of the Iceberg table table_name
by using the
following query statement:
SELECT * FROM catalog_name.schema_name."table_name$manifests";
The following table describes the table columns of the $manifests
table query
output:
Name | Type | Description |
---|---|---|
path |
VARCHAR | The manifest file location. |
length |
BIGINT | The manifest file length. |
partition_spec_id |
INTEGER | The identifier for the partition specification used to write the manifest file. |
added_snapshot_id |
BIGINT | The identifier of the snapshot during which this manifest entry has been added. |
added_data_files_count |
INTEGER | The number of data files with status ADDED in the
manifest file. |
added_rows_count |
BIGINT | The total number of rows in all data files with status
ADDED in the manifest file. |
existing_data_files_count |
INTEGER | The number of data files with status EXISTING in the
manifest file. |
existing_rows_count |
BIGINT | The total number of rows in all data files with status
EXISTING in the manifest file. |
deleted_data_files_count |
INTEGER | The number of data files with status DELETED in the
manifest file. |
deleted_rows_count |
BIGINT | The total number of rows in all data files with status
DELETED in the manifest file. |
partitions |
ARRAY(row(contains_null BOOLEAN, contains_nan BOOLEAN, lower_bound VARCHAR, upper_bound VARCHAR)) | Partition range metadata. |
$partitions #
The $partitions
table provides a detailed overview of the partitions of the
Iceberg table.
Retrieve the partitions information of the Iceberg table table_name
by using
the following query statement:
SELECT * FROM catalog_name.schema_name."table_name$partitions";
The following table describes the table columns of the $partitions
table query
output:
Name | Type | Description |
---|---|---|
partition |
ROW(...) | A row that contains the mapping of the partition column names to the partition column values. |
record_count |
BIGINT | The number of records in the partition. |
file_count |
BIGINT | The number of files mapped in the partition. |
total_size |
BIGINT | The size of all the files in the partition. |
data |
ROW(... ROW (min ..., max ... , null_count BIGINT, nan_count BIGINT)) | Partition range metadata. |
$files #
The $files
table provides a detailed overview of the data files in current
snapshot of the Iceberg table.
Retrieve the files information of the Iceberg table table_name
by using
the following query statement:
SELECT * FROM catalog_name.schema_name."table_name$files";
The following table describes the table columns of the $files
table query
output:
Name | Type | Description |
---|---|---|
content |
INTEGER | Type of content stored in the file. The supported content types in
Iceberg are DATA(0) , POSITION_DELETES(1) , and
EQUALITY_DELETES(2) . |
file_path |
VARCHAR | Type of the reference BRANCH or TAG . |
file_format |
VARCHAR | The format of the data file. |
record_count |
BIGINT | The number of entries contained in the data file. |
file_size_in_bytes |
BIGINT | The data file size. |
column_sizes |
map(INTEGER, BIGINT) | Mapping between the Iceberg column ID and its corresponding size in the file. |
value_counts |
map(INTEGER, BIGINT) | Mapping between the Iceberg column ID and its corresponding count of entries in the file. |
null_value_counts |
map(INTEGER, BIGINT) | Mapping between the Iceberg column ID and its corresponding count of
NULL values in the file. |
lower_bounds |
map(INTEGER, BIGINT) | Mapping between the Iceberg column ID and its corresponding lower bound in the file. |
upper_bounds |
map(INTEGER, BIGINT) | Mapping between the Iceberg column ID and its corresponding upper bound in the file. |
key_metadata |
VARBINARY | Metadata about the encryption key used to encrypt the file, if applicable. |
split_offsets |
array(BIGINT) | List of recommended slit locations. |
equality_ids |
array(INTEGER) | The set of field IDs used for equality comparison in equality delete files. |
$refs #
The $refs
table provides information about Iceberg references including
branches and tags.
Retrieve the refs information of the Iceberg table table_name
by using
the following query statement:
SELECT * FROM catalog_name.schema_name."table_name$refs";
The following table describes the table columns of the $refs
table query
output:
Name | Type | Description |
---|---|---|
name |
VARCHAR | Name of the reference. |
type |
VARCHAR | Type of the reference BRANCH or TAG . |
snapshot_id |
BIGINT | The snapshot ID of the reference. |
max_reference_age_in_ms |
BIGINT | The maximum age of the reference before it could be expired. |
min_snapshots_to_keep |
INTEGER | For branch only, the minimum number of snapshots to keep in a branch. |
max_snapshot_age_in_ms |
BIGINT | For branch only, the max snapshot age allowed in a branch. Older snapshots in the branch will be expired. |
Table procedures #
Use the CALL
statement to perform data manipulation or administrative tasks.
Procedures must include a qualified catalog name. For example, for a catalog
named examplecatalog
:
CALL examplecatalog.system.example_procedure();
The following Iceberg table procedures are available:
register_table #
system.register_table('schema_name', 'table_name','s3://bucket-name/lakehouse/data/table_name_location');
Allows the caller to register an existing Iceberg table in the metastore, using its existing metadata and data files.
The table_location
must be a complete URI, as the previous example shows.
Provide a file name to register a table with specific metadata. In addition, this procedure can be used to register the table with a specific table state, or may be necessary if Starburst Galaxy cannot automatically determine which metadata version to use:
system.register_table('schema_name', 'table_name', 's3://bucket-name/lakehouse/data/table_name_location'),'00003-409702ba-4735-4645-8f14-09537cc0b2c8.metadata.json');
unregister_table #
system.unregister_table('schema_name', 'table_name');
Allows the caller to unregister an existing Iceberg table from the metastores without deleting the data.
rollback_to_snapshot #
system.rollback_to_snapshot('schema_name', 'table_name', 8954597067493422955)
Allows the caller to roll back the state of the table to a previous snapshot ID.
ALTER TABLE EXECUTE #
The ALTER TABLE EXECUTE statement followed by a
command and parameters modifies the table according to the specified procedure
and parameters. For Iceberg tables, ALTER TABLE EXECUTE
supports the
following commands:
Use the =>
operator for passing named parameter values. The left side is
the name of the parameter and the right side is the value being passed:
ALTER TABLE catalog_name.schema_name.table_name EXECUTE optimize(file_size_threshold => '10MB');
optimize #
The optimize
procedure improves read performance by rewriting the content of a
specific table to reduce the number of files. This process increases file size.
If the table is partitioned, the data compaction acts separately on each
partition selected for optimization.
All files with a size below the optional file_size_threshold
parameter are
merged. The default value is 100MB
.
ALTER TABLE catalog_name.schema_name.table_name EXECUTE optimize;
The following statement merges files in a table that are under the 10MB
file
size threshold:
ALTER TABLE catalog_name.schema_name.table_name EXECUTE optimize(file_size_threshold => '10MB');
Optimize specific partitions using a WHERE
clause with the columns used as
partition_key
value.
ALTER TABLE partitioned_table_name EXECUTE optimize
WHERE partition_key = 1;
drop_extended_stats #
ALTER TABLE catalog_name.schema_name.table_name EXECUTE drop_extended_stats;
Removes all extended statistics information from the table.
expire_snapshots #
The expire_snapshots
procedure removes all snapshots and all related metadata
and data files. Regularly expiring snapshots is recommended to delete data files
that are no longer needed, and to keep the size of table metadata small. The
procedure affects all snapshots that are older than the time period configured
with the retention_threshold
parameter. The following example changes the
threshold to a value of nine days:
ALTER TABLE catalog_name.schema_name.table_name EXECUTE expire_snapshots(retention_threshold => '9d');
The value for retention_threshold
must be higher than or equal to the default
value of 7d
. Otherwise, the procedure fails with a message similar to:
Retention specified (1d) is shorter than the minimum retention configured in the system (7d). The default value is
7d
.
remove_orphan_files #
The remove_orphan_files
procedure removes all files from a table’s data
directory that are not associated with metadata files. It also removes files
that are older than the value of the retention_threshold
parameter. Deleting
orphaned files periodically is recommended to keep the size of a table’s data
directory under control. The following example changes the retention_threshold
to nine days:
ALTER TABLE catalog_name.schema_name.table_name EXECUTE remove_orphan_files(retention_threshold => '9d');
The value for retention_threshold
must be higher than or equal to the default
value of 7d
. Otherwise, the procedure fails with a message similar to:
Retention specified (1d) is shorter than the minimum retention configured in the system (7d). The default value is
7d
.
Migrate Hive to Iceberg #
Migration from Hive to Iceberg table format is supported.
The migration process uses the ALTER TABLE SET PROPERTIES
syntax. See more
information on the Hive table format page.
Session properties #
A session property temporarily modifies an
internal Starburst Galaxy setting for the duration of the current connection
session to the cluster. Use the SET SESSION
statement followed by the property
name, then a value such as true
or false
to modify the property:
SET SESSION catalog_name.session_property = expression;
Use the SHOW SESSION statement to view all current session properties. For additional information, read about the SET SESSION, and RESET SESSION SQL statements.
Catalog session properties are internal session properties that can be set
on a per-catalog basis. These properties must be set separately for each catalog
by including the catalog name before the property name, for example,
catalog_name.property_name
.
Session properties are linked to the current session, which lets you have multiple connections to a cluster that each have different values for the same session properties. Once a session ends, either by disconnecting or creating a new session, any changes made to session properties during the previous session are lost.
The following sections describe the session properties supported by Iceberg table type:
collect_extended_statistics_on_write #
SET SESSION catalog_name.collect_extended_statistics_on_write = true;
Collect extended statistics while writing files. The default value is true
.
compression_codec #
SET SESSION catalog_name.compression_codec = 'DEFAULT';
The compression codec is used when writing new data files. The possible values are:
NONE
DEFAULT
SNAPPY
LZ4
ZSTD
GZIP
The default value is DEFAULT
.
dynamic_filtering_wait_timeout #
SET SESSION catalog_name.dynamic_filtering_wait_timeout = '10s';
The duration to wait for completion of dynamic filtering during split generation. Dynamic filtering optimizations significantly improve the performance of queries with selective joins by avoiding reading of data that would be filtered by join conditions.
This extra wait time can potentially result in significant overall savings in
query and CPU time, if dynamic filtering is able to reduce the amount of scanned
data. In the previous example, the dynamic_filtering_wait_timeout
is set to
10
seconds.
expire_snapshots_min_retention #
SET SESSION catalog_name.expire_snapshots_min_retention = '7d';
The minimum retention period for the expire_snapshot
procedure. The default
value is 7d
.
extended_statistics_enabled #
SET SESSION catalog_name.extended_statistics_enabled = true;
Enables statistics collection with the ANALYZE
statement and the use of extended statistics. The default value is true
. The
statistics_enabled session property must be set to
true
. Otherwise, the collection of any statistics is disabled.
Use the ANALYZE
statement to populate data size and number of distinct values
(NDV) extended table statistics in Iceberg. The minimum value, maximum value,
value count, and null value count statistics are computed on the fly out of the
transaction log of the Iceberg table.
merge_manifests_on_write #
SET SESSION catalog_name.merge_manifests_on_write = true;
Compact the manifest files when performing write operations. The default value
is true
.
minimum_assigned_split_weight #
SET SESSION catalog_name.minimum_assigned_split_weight = 0.05;
The minimum assigned split weight when size based split weighting is enabled.
The default value is 0.05
.
orc_bloom_filters_enabled #
SET SESSION catalog_name.orc_bloom_filters_enabled = `false`;
Enable bloom filters for predicate pushdown. The default value is false
.
orc_lazy_read_small_ranges #
SET SESSION catalog_name.orc_lazy_read_small_ranges = true;
Read small file segments lazily. The default value is true
.
orc_max_buffer_size #
SET SESSION catalog_name.orc_max_buffer_size = '8MB';
Maximum size of a single read. The default value is 8MB
.
orc_max_merge_distance #
SET SESSION catalog_name.orc_max_merge_distance = '1MB';
Maximum size of the gap between two reads to merge into a single read. The
default value is 1MB
.
orc_max_read_block_size #
SET SESSION catalog_name.orc_max_read_block_size = '16MB';
Soft maximum size of Trino blocks produced by the ORC reader. The default value
is 16MB
.
orc_native_zstd_decompressor_enabled #
SET SESSION catalog_name.orc_native_zstd_decompressor_enabled = true;
Enable using native zstd
library for faster decompression of ORC files. The
default value is true
.
orc_nested_lazy_enabled #
SET SESSION catalog_name.orc_nested_lazy_enabled = true;
Lazily read nested data. The default value is true
.
orc_stream_buffer_size #
SET SESSION catalog_name.orc_stream_buffer_size = '8MB';
Size of buffer for streaming reads. The default value is 8MB
.
orc_string_statistics_limit #
SET SESSION catalog_name.orc_string_statistics_limit = '64B';
Maximum size of the string statistics. The default value is 64B
.
orc_tiny_stripe_threshold #
SET SESSION catalog_name.orc_tiny_stripe_threshold = '8MB';
The threshold below which an ORC stripe or file will read in its entirety. The
default value is 8MB
.
orc_writer_max_dictionary_memory #
SET SESSION catalog_name.orc_writer_max_dictionary_memory = '16MB';
The maximum dictionary memory. The default value is 16MB
.
orc_writer_max_stripe_rows #
SET SESSION catalog_name.orc_writer_max_stripe_rows = 10000000;
The maximum stripe row count. The default value is 10000000
.
orc_writer_max_stripe_size #
SET SESSION catalog_name.orc_writer_max_stripe_size = '64MB';
The maximum stripe size. The default is 64MB
.
orc_writer_min_stripe_size #
SET SESSION catalog_name.orc_writer_min_stripe_size = '32MB';
The minimum stripe size. The default is 32MB
.
orc_writer_validate_mode #
SET SESSION catalog_name.orc_writer_validate_mode = 'BOTH';
The level of detail in ORC
validation. The possible values are:
HASHED
DETAILED
BOTH
The default value is BOTH
.
orc_writer_validate_percentage #
SET SESSION catalog_name.orc_writer_validate_percentage = 0;
The percentage of written files to validate by rereading them. The default value
is 0
.
parquet_max_read_block_row_count #
SET SESSION catalog_name.parquet_max_read_block_row_count = 8192;
Sets the maximum number of rows read in a batch. The default value is 8192
.
parquet_max_read_block_size #
SET SESSION catalog_name.parquet_max_read_block_size = '16MB';
The maximum block size used when reading Parquet files. The default value is
16MB
.
parquet_native_snappy_decompressor_enabled #
SET SESSION catalog_name.parquet_native_snappy_decompressor_enabled = true;
Enable using native SNAPPY
library for faster decompression of Parquet files.
The default value is true
;
parquet_optimized_reader_enabled #
SET SESSION catalog_name.parquet_optimized_reader_enabled = true;
Specifies whether batched column readers are used when reading Parquet files for
improved performance. Set this property to false
to disable the optimized
Parquet reader The default value is true
.
parquet_native_zstd_decompressor_enabled #
SET SESSION catalog_name.parquet_native_zstd_decompressor_enabled = true;
Enable using native ZSTD
library for faster decompression of Parquet files.
The default value is true
.
parquet_optimized_nested_reader_enabled #
SET SESSION catalog_name.parquet_optimized_nested_reader_enabled = true;
Specifies whether batched column readers are used when reading ARRAY
, MAP
,
and ROW
types from Parquet files for improved performance. Set this property
to false
to disable the optimized Parquet reader for structural data types.
The default value is true
.
parquet_use_bloom_filter #
SET SESSION catalog_name.parquet_use_bloom_filter = true;
Specifies whether Bloom filters are used for predicate pushdown when reading
Parquet files. The default value is true
;
parquet_vectorized_decoding_enabled #
SET SESSION catalog_name.parquet_use_column_names = true;
Enable the use of the Java Vector API for faster decoding of Parquet files. The
default value is true
.
parquet_writer_batch_size #
SET SESSION catalog_name.parquet_writer_batch_size = 10000;
Maximum number of rows processed by the Parquet writer in a batch. The default
value is 10000
.
parquet_writer_block_size #
SET SESSION catalog_name.parquet_writer_block_size = '128MB';
The maximum block size created by the Parquet writer. The default value is
128MB
.
parquet_writer_page_size #
SET SESSION catalog_name.parquet_writer_page_size = '1MB';
The maximum page size created by the Parquet writer. The default value is 1MB
.
projection_pushdown_enabled #
SET SESSION catalog_name.projection_pushdown_enabled = true;
Enable projection pushdown. The default value is true
.
remove_orphan_files_min_retention #
SET SESSION catalog_name.remove_orphan_files_min_retention = '7d';
The minimum retention for the files taken into account for removal by
the remove_orphan
procedure. The default value is 7d
.
sorted_writing_enabled #
SET SESSION catalog_name.sorted_writing_enabled = true;
Enable sorted writes. The default value is true
.
statistics_enabled #
SET SESSION catalog_name.statistics_enabled = true;
Enables table statistics for performance improvements. The default value is
true
. If the value is set to false
the following session property is
disabled:
target_max_file_size #
SET SESSION catalog_name.target_max_file_size = '1GB';
Target maximum size of written files. The default value is 1GB
.
use_file_size_from_metadata #
SET SESSION catalog_name.use_file_size_from_metadata = true;
Use the file size stored in Iceberg metadata. The default value is true
.
Iceberg SQL support #
When using the Iceberg table format with Great Lakes connectivity, the general SQL support details apply.
Is the information on this page helpful?
Yes
No
- Iceberg table format
- Specify Iceberg format
- Metadata tables
- Table procedures
- Migrate Hive to Iceberg
- Session properties
- collect_extended_statistics_on_write
- compression_codec
- dynamic_filtering_wait_timeout
- expire_snapshots_min_retention
- extended_statistics_enabled
- merge_manifests_on_write
- minimum_assigned_split_weight
- orc_bloom_filters_enabled
- orc_lazy_read_small_ranges
- orc_max_buffer_size
- orc_max_merge_distance
- orc_max_read_block_size
- orc_native_zstd_decompressor_enabled
- orc_nested_lazy_enabled
- orc_stream_buffer_size
- orc_string_statistics_limit
- orc_tiny_stripe_threshold
- orc_writer_max_dictionary_memory
- orc_writer_max_stripe_rows
- orc_writer_max_stripe_size
- orc_writer_min_stripe_size
- orc_writer_validate_mode
- orc_writer_validate_percentage
- parquet_max_read_block_row_count
- parquet_max_read_block_size
- parquet_native_snappy_decompressor_enabled
- parquet_optimized_reader_enabled
- parquet_native_zstd_decompressor_enabled
- parquet_optimized_nested_reader_enabled
- parquet_use_bloom_filter
- parquet_vectorized_decoding_enabled
- parquet_writer_batch_size
- parquet_writer_block_size
- parquet_writer_page_size
- projection_pushdown_enabled
- remove_orphan_files_min_retention
- sorted_writing_enabled
- statistics_enabled
- target_max_file_size
- use_file_size_from_metadata
- Iceberg SQL support
Is the information on this page helpful?
Yes
No