Deploy Hive Metastore with Kubernetes #
This topic covers deploying a Hive metastore service (HMS) with the HMS Helm chart. This deployment is required if you use object stores such as Hive or Iceberg and you are not using an alternate metastore such as AWS Glue.
This topic assumes you are familiar with the HMS and how it is used, as well as with Helm charts and Kubernetes (k8s) tools such as
kubectl
. Ensure that you
are familiar with the following Starburst Enterprise Kubernetes topics before
configuring and deploying the HMS:- Kubernetes best practices
- Kubernetes requirements
Overview #
You can deploy HMS using the Starburst Kubernetes (K8s) Helm chart for use with SEP on supported Kubernetes services.
This section describes the process for deploying HMS into your environment. Our reference documentation contains a complete listing of configuration properties and additional customization options in the Helm chart.
Configure the Hive Metastore #
There are several top-level nodes in the HMS Helm chart that you must modify for a minimum HMS configuration:
serviceAccountName:
resources:
database:
expose:
hiveMetastoreWarehouseDir:
hdfs:
objectStorage:
If you are using TLS, this must also be considered. This section covers getting started with these four configuration steps. Our reference documentation provides details about the content of the HMS Helm chart, including yaml sections not discussed here.
As with SEP, we strongly suggest that you initially deploy HMS with the minimum configuration described in this topic, and ensure that it deploys and is accessible before making any additional customizations described in our reference documentation.
hms-values.yaml
that is
used in the Helm upgrade
command.Before you begin #
Get the latest starburst-hive
Helm chart as described in our installation
guide with the configured registry
access.
Configure resources and service account #
Ensure that the following top-level nodes of the Helm chart have the correct values to reflect your environment:
serviceAccountName:
- We strongly recommend using a service account for the pod.resources:
- Ensure that the CPU and memory sizes are appropriate for your instance type.
heapSizePercentage:
at the default value.
Configure the PostgreSQL backing database #
The configuration properties for the PostgreSQL database are found in the
database:
top-level node. As a minimal customization, you must ensure that the
following are set correctly for your environment:
database:
type: "internal"
internal:
port: 5432
databaseName: "hive"
databaseUser: "Hive"
databasePassword: "HivePassw0rd1234"
You must also configure volume:
persistence and resources, as well as the
resources:
for the backing database itself in the database:
node. For a
complete list of available backing database properties, see our reference
documentation.
database.resources:
node is separate from
the top level resources:
node. It defines the resources available to the
backing database itself, not the HMS server.Configure storage location and account, and object storage authentication #
The default configuration for the hiveMetastoreWarehouseDir:
, hdfs:
and
objectStorage:
top-level nodes are empty.
In the hdfs:
top-level node of the Helm chart, add the hadoopUserName:
used
to connect to the Hive site defined in the hiveMetastoreWarehouseDir:
top-level node to query and create objects.
There are several templates for configuring object storage in the
objectStorage
node. For example, you can define how to connect to S3:
objectStorage:
awsS3:
region:
endpoint:
accessKey:
secretKey:
pathStyleAccess: false
There are also templates for Azure and Azure Data Lake, and Google object storage. Secrets are specified directly in the HMS chart.
For a complete list of storage-related configuration options, see our reference documentation.
Configure TLS (optional) #
If your organization uses TLS, you can enable and configure your HMS to work with it. The most straightforward way to handle TLS is to terminate TLS at the load balancer or ingress, using a signed certificate. We strongly suggest this method, which requires no additional configuration in the HMS.
If you choose not to handle TLS using that method, you can instead configure it
in the expose:
top-level node of the HMS Helm chart:
expose:
type: "[clusterIp|nodePort|loadBalancer|ingress]"
You must refer to our reference documentation for full details on configuring
each of these
types.
The default expose:
type is clusterIp
. However, this is not suitable for
production environments. If you need help choosing which type is best, refer to
the expose:
documentation
for SEP.
Deploy the HMS #
When the HMS is configured, run the following command to deploy it. In this
example, the minimal values YAML file with the registry
credentialsnamed registry-access.yaml
is used along with the hms-values.yaml
containing the HMS customizations:
$ helm upgrade hms starburst/starburst-hive \
--install \
--values ./registry-access.yaml \
--values ./hms-values.yaml
Once the pod is deployed, other services can use this HMS, if needed.
Next steps #
- Complete your HMS configuration
- Add the metastore configuration property to any Hive, Iceberg or Delta Lake catalogs you create. Refer to the specific connector documentation.
Is the information on this page helpful?
Yes
No
Is the information on this page helpful?
Yes
No