Deploy Hive Metastore with Kubernetes #

This topic covers deploying a Hive metastore service (HMS) with the HMS Helm chart. This deployment is required if you use object stores such as Hive or Iceberg and you are not using an alternate metastore such as AWS Glue.

Overview #

You can deploy HMS using the Starburst Kubernetes (K8s) Helm chart for use with SEP on supported Kubernetes services.

This section describes the process for deploying HMS into your environment. Our reference documentation contains a complete listing of configuration properties and additional customization options in the Helm chart.

Configure the Hive Metastore #

There are several top-level nodes in the HMS Helm chart that you must modify for a minimum HMS configuration:

  • serviceAccountName:
  • resources:
  • database:
  • expose:
  • hiveMetastoreWarehouseDir:
  • hdfs:
  • objectStorage:

If you are using TLS, this must also be considered. This section covers getting started with these four configuration steps. Our reference documentation provides details about the content of the HMS Helm chart, including yaml sections not discussed here.

As with SEP, we strongly suggest that you initially deploy HMS with the minimum configuration described in this topic, and ensure that it deploys and is accessible before making any additional customizations described in our reference documentation.

Before you begin #

Get the latest starburst-hive Helm chart as described in our installation guide with the configured registry access.

Configure resources and service account #

Ensure that the following top-level nodes of the Helm chart have the correct values to reflect your environment:

  • serviceAccountName: - We strongly recommend using a service account for the pod.
  • resources: - Ensure that the CPU and memory sizes are appropriate for your instance type.

Configure the PostgreSQL backing database #

The configuration properties for the PostgreSQL database are found in the database: top-level node. As a minimal customization, you must ensure that the following are set correctly for your environment:

database:
  type: "internal"
  internal:
    port: 5432
    databaseName: "hive"
    databaseUser: "Hive"
    databasePassword: "HivePassw0rd1234"

You must also configure volume: persistence and resources, as well as the resources: for the backing database itself in the database: node. For a complete list of available backing database properties, see our reference documentation.

Configure storage location and account, and object storage authentication #

The default configuration for the hiveMetastoreWarehouseDir:, hdfs: and objectStorage: top-level nodes are empty.

In the hdfs: top-level node of the Helm chart, add the hadoopUserName: used to connect to the Hive site defined in the hiveMetastoreWarehouseDir: top-level node to query and create objects.

There are several templates for configuring object storage in the objectStorage node. For example, you can define how to connect to S3:

objectStorage:
  awsS3:
    region:
    endpoint:
    accessKey:
    secretKey:
    pathStyleAccess: false

There are also templates for Azure and Azure Data Lake, and Google object storage. Secrets are specified directly in the HMS chart.

For a complete list of storage-related configuration options, see our reference documentation.

Configure TLS (optional) #

If your organization uses TLS, you can enable and configure your HMS to work with it. The most straightforward way to handle TLS is to terminate TLS at the load balancer or ingress, using a signed certificate. We strongly suggest this method, which requires no additional configuration in the HMS.

If you choose not to handle TLS using that method, you can instead configure it in the expose: top-level node of the HMS Helm chart:

expose:
  type: "[clusterIp|nodePort|loadBalancer|ingress]"

You must refer to our reference documentation for full details on configuring each of these types. The default expose: type is clusterIp. However, this is not suitable for production environments. If you need help choosing which type is best, refer to the expose: documentation for SEP.

Deploy the HMS #

When the HMS is configured, run the following command to deploy it. In this example, the minimal values YAML file with the registry credentialsnamed registry-access.yaml is used along with the hms-values.yaml containing the HMS customizations:

$ helm upgrade hms starburst/starburst-hive \
    --install \
    --values ./registry-access.yaml \
    --values ./hms-values.yaml

Once the pod is deployed, other services can use this HMS, if needed.

Next steps #