Hudi table format #
Great Lakes connectivity abstracts the details of using different table formats and file types when using object storage catalogs.
This page describes the features specific to the Hudi table format when used with Great Lakes connectivity.
Hudi tables are read-only #
Hudi tables have read-only support. Existing tables of the Hudi format that are detected in a Galaxy-connected object storage location are read automatically.
Galaxy cannot create new Hudi tables or write to them.
Metadata tables #
Great Lakes connectivity exposes several metadata tables
for the Hudi table format. These metadata tables contain information about
the internal structure of the Hudi table. Query each metadata table by
appending the metadata table name to the table_name
:
SELECT * FROM catalog_name.schema_name."table_name$timeline";
$timeline #
The $timeline
table provides a detailed view of metadata instants in the Hudi
table. Instants are specific points in time.
The following table describes the table columns of the $timeline
table query
output:
Name | Type | Description |
---|---|---|
timestamp |
VARCHAR | Instant time is a timestamp when the actions performed. |
action |
VARCHAR | The type of action made on the table. |
state |
VARCHAR | The current state of the instant. |
Session properties #
A session property temporarily modifies a
configuration property by a user for the duration of the current connection
session to the cluster. Use the SET SESSION
statement followed by a
value such as true
or false
to modify the property:
SET SESSION catalog_name.session_property = expression;
Use the SHOW SESSION statement to view all current session properties. For additional information, read about the SET SESSION, and RESET SESSION SQL statements.
Catalog session properties are connector-defined session properties that can
be set on a per-catalog basis. These properties must be set separately for each
catalog by including the catalog name before the property name, for example,
catalog_name.property_name
.
Session properties are linked to the current session, so a user can have multiple connections to a cluster that each have different values for the same session properties. Once a session ends, either by disconnecting or creating a new session, any changes made to session properties during the previous session are lost.
The following sections describe the properties supported by the Hudi table type:
parquet_optimized_reader_enabled #
SET SESSION catalog_name.parquet_optimized_reader_enabled = true;
Specifies whether batched column readers are used when reading Parquet files for
improved performance. Set this property to false
to disable the optimized
Parquet reader The default value for parquet_optimized_reader_enabled
is
true
.
parquet_optimized_nested_reader_enabled #
SET SESSION catalog_name.parquet_optimized_nested_reader_enabled = true;
Specifies whether batched column readers are used when reading ARRAY
, MAP
,
and ROW
types from Parquet files for improved performance. Set this property
to false
to disable the optimized Parquet reader for structural data types.
The default value is true
.
Hudi SQL support #
When using the Hudi table format with Great Lakes connectivity, the general SQL support details apply, with the following additional consideration.
- Write operations including data management and schema and table management are not supported.
Is the information on this page helpful?
Yes
No
Is the information on this page helpful?
Yes
No