Starburst Galaxy

  •  Get started

  •  Working with data

  •  Data engineering

  •  Developer tools

  •  Cluster administration

  •  Troubleshooting

  • Galaxy status

  •  Reference
  • Hudi table format #

    Great Lakes connectivity abstracts the details of using different table formats and file types when using object storage catalogs.

    This page describes the features specific to the Hudi table format when used with Great Lakes connectivity.

    Hudi tables are read-only #

    Hudi tables have read-only support. Existing tables of the Hudi format that are detected in a Galaxy-connected object storage location are read automatically.

    Galaxy cannot create new Hudi tables or write to them.

    Session properties #

    See session properties on the Great Lakes connectivity page to understand how these properties are used.

    The following table describes the session properties supported only by the Hudi table format.

    Session property Description
    parquet_optimized_nested_ reader_enabled Specifies whether batched column readers are used when reading ARRAY, MAP, and ROW types from Parquet files for improved performance. Set this property to false to disable the optimized Parquet reader for structural data types. The default value is true.
    parquet_optimized_reader_ enabled Specifies whether batched column readers are used when reading Parquet files for improved performance. Set this property to false to disable the optimized Parquet reader The default value is true.

    Hudi SQL support #

    When using the Hudi table format with Great Lakes connectivity, the general SQL support details apply, with the following additional consideration.