Hive metastore configuration#

The starburst-hive Helm chart configures a Hive Metastore Service (HMS) and optionally the backing database in the cluster with the values.yaml file detailed in the following sections.

A minimal values file adds the registry credentials and overrides any defaults to suitable values.

Using the HMS#

The expose section configures the DNS availability of the HMS in the cluster. By default the HMS is available at the hostname hive and port 9083. As a result the Thrift URL within the cluster is thrift://hive:9083.

You can use the URL for any catalog:

catalog:
  datalake: |
    connector.name=hive-hadoop2
    hive.metastore.uri=thrift://hive:9083

Docker image and registry#

Same as Docker image and registry section for the Helm chart for SEP.

image:
  repository: "harbor.starburstdata.net/starburstdata/hive"
  tag: "354.0.0"
  pullPolicy: "IfNotPresent"

registryCredentials:
  enabled: false
  registry:
  username:
  password:

imagePullSecrets:
 - name:

Exposing the pod to outside network#

The expose section for the HMS works identical to the SEP server expose section. Differences are isolated to the configured default values. The default type is clusterIp. Ensure to adapt your configured catalogs to use the correct Thrift URL for HMS, when changing this configuration.

expose:
  type: "clusterIp"
  clusterIp:
    name: "hive"
    ports:
      http:
        port: 9083
expose:
  type: "nodePort"
  nodePort:
    name: "hive"
    ports:
      http:
        port: 9083
        nodePort: 30083
expose:
  type: "ingress"
  ingress:
    tls:
      enabled: true
      secretName:
    host:
    path: /
    annotations: {}

Database backend for HMS#

The database backend for HMS is a PostgreSQL database internal to the cluster by default:

database:
  type: internal
  internal:
    image:
      repository: "library/postgres"
      tag: "10.6"
      pullPolicy: "IfNotPresent"
    volume:
      # use one of:
      # - existingVolumeClaim to specify existing PVC
      # - persistentVolumeClaim to specify spec for new PVC
      # - other volume type inline configuration, e.g. emptyDir
      # Examples:
      # existingVolumeClaim: "my_claim"
      # persistentVolumeClaim:
      #  storageClassName:
      #  accessModes:
      #    - ReadWriteOnce
      #  resources:
      #    requests:
      #      storage: "2Gi"
      emptyDir: {}
    resources:
      requests:
        memory: "1Gi"
        cpu: 2
      limits:
        memory: "1Gi"
        cpu: 2
    driver: "org.postgresql.Driver"
    port: 5432
    databaseName: "hive"
    databaseUser: "hive"
    databasePassword: "HivePass1234"
    envFrom: []
    env: []

database.internal.envFrom: YAML sequence of mappings to define Secret or Configmap as a source of environment variables for the internal PostgreSQL container.

database.internal.env: YAML sequence of mappings to define two keys environment variables for the internal PostgreSQL container.

For example, OpenShift deployments often do not have access to pull from the default Docker registry library/postgres. You can replace it with an image from the Red Hat registry, which requires additional environment variables set with the parameter database.internal.env:

database:
  type: internal
  internal:
    image:
       repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
       tag: "latest"
    env:
      - name: POSTGRESQL_DATABASE
        value: "hive"
      - name: POSTGRESQL_USER
        value: "hive"
      - name: POSTGRESQL_PASSWORD
        value: "HivePass1234"

Another option is to create a Secret (ex. postgresql-secret) containing variables needed by postgresql which are mentioned in previous code block, and pass it to the container with envFrom parameter:

database:
  type: internal
  internal:
    image:
       repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
       tag: "latest"
    envFrom:
      - secretRef:
          name: postgresql-secret

Alternatively you can use an external PostgreSQL or MySQL database, by setting database.type to external and configuring the nested properties.

database:
  type: external
  external:
    jdbcUrl:
    driver:
    user:
    password:

database.external.jdbcUrl: JDBC URL to connect to the external database as required by the database and used driver, including hostname and port. Ensure you use a valid JDBC URL as required by the PostgreSQL or MySQL driver. Typically the syntax requires the host, port and database name jdbc:postgresql://host:port/database or jdbc:mysql://host:port/database.

database.external.driver: Valid values are com.mysql.jdbc.Driver for an external MySQL or compatible database or org.postgresql.Driver for a PostgreSQL database.

database.external.user: Database user name to access the external database using JDBC.

database.external.password: Password for the user configured to access the external database using JDBC.

Server start up configuration#

You can create a startup shell script to customize how HMS is started, and pass additional arguments to it.

initFile:
extraArguments:

initFile

A shell script to run before HMS is launched. The content of the file has to be an inline string in the YAML file. The original startup command is passed as the first argument. The script needs to invoke it at the end as exec "$@". Use exec "$1" if passing any extra arguments.

extraArguments

List of extra arguments to pass to the initFile script.

The following example shows how you can use initFile to run a custom start up script. The init script must end with exec "$@":

initFile: |
  #!/bin/bash
  echo "Custom init for $2"
  exec "$@"
extraArguments:
  - TEST_ARG

Additional volumes#

Additional volumes can be necessary for persisting files. These can be defined in the additionalVolumes section. None are defined by default:

additionalVolumes: []

You can add one or more volumes supported by k8s, to all nodes in the cluster.

If you specify path only, a directory named in path is created. When mounting ConfigMap or Secret, files are created in this directory for each key.

This also supports an optional subPath parameter which takes in an optional key in the ConfigMap or Secret volume you create. If you specify subPath, a specific key named subPath from ConfigMap or Secret is mounted as a file with the name provided by path.

additionalVolumes:
  - path: /mnt/InContainer
    volume:
      emptyDir: {}
  - path: /etc/hive/conf/test_config.txt
    subPath: test_config.txt
    volume:
      configMap:
        name: "configmap-in-volume"

Storage#

The chart allows you to configure the credentials to access the HDFS or object storage. The credentials enable the HMS to access to storage for metadata information including statistics gathering.

In addition you have to configure the catalog with sufficient, corresponding credentials.

The default configuration includes no credentials:

hdfs:
  hadoopUserName:
objectStorage:
  awsS3:
    region:
    endpoint:
    accessKey:
    secretKey:
    pathStyleAccess: false
  gs:
    cloudKeyFileSecret:
  azure:
    abfs:
      authType: "accessKey"
      accessKey:
        storageAccount:
        accessKey:
      oauth:
        clientId:
        secret:
        endpoint:
    wasb:
      storageAccount:
      accessKey:
  adl:
    oauth2:
      clientId:
      credential:
      refreshUrl:

hdfs.hadoopUserName: User name for Hadoop HDFS access

objectStorage.awsS3.*: Configuration for AWS S3 access

objectStorage.awsS3.region: AWS region name

objectStorage.awsS3.endpoint: AWS S3 endpoint

objectStorage.awsS3.accessKey: Name of the access key for AWS S3

objectStorage.awsS3.secretKey: Name of the secret key for AWS S3

objectStorage.awsS3.pathStyleAccess:

objectStorage.gs.*: Configuration for Google Storage access

objectStorage.gs.cloudKeyFileSecret: Name of the secret with the file containing the access key to the cloud storage. The key of the secret must be named key.json

objectStorage.azure.*: Configuration for Microsoft Azure storage systems

objectStorage.azure.abfs.*: Configuration for Azure Blob Filesystem (ABFS)

objectStorage.azure.abfs.authType: Authentication to access ABFS, Valid values are``accessKey`` or oauth, configuration in the following properties.

objectStorage.azure.abfs.accessKey.*: Configuration for access key authentication to ABFS

objectStorage.azure.abfs.accessKey.storageAccount: Name of the ABFS account to access

objectStorage.azure.abfs.accessKey.accessKey: Actual access key to use for ABFS access

objectStorage.azure.abfs.oauth.*: Configuration for OAuth authentication to ABFS

objectStorage.azure.abfs.oauth.clientId: Client identifier for OAuth authentication.

objectStorage.azure.abfs.oauth.secret: Secret for OAuth.

objectStorage.azure.abfs.oauth.endpoint: Endpoint URL for OAuth.

objectStorage.azure.wasb.*: Configuration for Windows Azure Storage Blob (WASB)

objectStorage.azure.wasb.storageAccount: Name of the storage account to use for WASB.

objectStorage.azure.wasb.accessKey: Key to access WASB.

objectStorage.azure.adl: Configuration for Azure Data Lake (ADL)

objectStorage.azure.adl.oauth2.*: Configuration for OAuth authentication to ADL

objectStorage.azure.adl.oauth2.clientId: Client identifier for OAuth access to ADL

objectStorage.azure.adl.oauth2.credential: Credential for OAuth access to ADL

objectStorage.azure.adl.oauth2.refreshUrl: Refresh URL for the OAuth access to ADL.

Server configuration#

heapSizePercentage: 85

resources:
  requests:
    memory: "1Gi"
    cpu: 1
  limits:
    memory: "1Gi"
    cpu: 1

Node assignment#

You can add configuration to determine the node and pod to use:

nodeSelector: {}
tolerations: []
affinity: {}

Annotations#

You can add configuration to annotate the deployment and pod:

deploymentAnnotations: {}
podAnnotations: {}

Environment variables#

You can pass environment variables to the HMS container using the same mechanism used for the internal database:

envFrom: []
env: []

Both are specified as a mapping sequences for example:

envFrom:
  - secretRef:
      name: my-secret-with-vars
env:
  - name: MY_VARIABLE
    value: some-value