Hudi table format #
This page describes the features specific to the Hudi table format when used with Great Lakes connectivity.
Hudi tables are read-only #
Hudi tables have read-only support. Existing tables of the Hudi format that are detected in a Galaxy-connected object storage location are read automatically.
Galaxy cannot create new Hudi tables or write to them.
Metadata tables #
Great Lakes connectivity exposes several metadata tables
for the Hudi table format. These metadata tables contain information about
the internal structure of the Hudi table. Query each metadata table by
appending the metadata table name to the
SELECT * FROM catalog_name.schema_name."table_name$timeline";
$timeline table provides a detailed view of metadata instants in the Hudi
table. Instants are specific points in time.
The following table describes the table columns of the
$timeline table query
||VARCHAR||Instant time is a timestamp when the actions performed.|
||VARCHAR||The type of action made on the table.|
||VARCHAR||The current state of the instant.|
Session properties #
A session property temporarily modifies a
configuration property by a user for the duration of the current connection
session to the cluster. Use the
SET SESSION statement followed by a
value such as
false to modify the property:
SET SESSION catalog_name.session_property = expression;
Catalog session properties are connector-defined session properties that can
be set on a per-catalog basis. These properties must be set separately for each
catalog by including the catalog name before the property name, for example,
Session properties are linked to the current session, so a user can have multiple connections to a cluster that each have different values for the same session properties. Once a session ends, either by disconnecting or creating a new session, any changes made to session properties during the previous session are lost.
The following sections describe the properties supported by the Hudi table type:
SET SESSION catalog_name.parquet_optimized_reader_enabled = true;
Specifies whether batched column readers are used when reading Parquet files for
improved performance. Set this property to
false to disable the optimized
Parquet reader The default value for
SET SESSION catalog_name.parquet_optimized_nested_reader_enabled = true;
Specifies whether batched column readers are used when reading
ROW types from Parquet files for improved performance. Set this property
false to disable the optimized Parquet reader for structural data types.
The default value is
Hudi SQL support #
When using the Hudi table format with Great Lakes connectivity, the general SQL support details apply, with the following additional consideration.
Is the information on this page helpful?