Configuring Starburst Enterprise with Ranger in Kubernetes#

Looking for the installation guide? This topic covers configuring the Starburst cache service after you completed a basic install of Starburst Enterprise. If you have not yet completed the basic install, go to our installation guide.

Go to the installation guide

The starburst-ranger Helm chart configures Apache Ranger 2.3.0 usage in the cluster with the values.yaml file detailed in the following sections. It allows you to implement global access control or just Hive access control with Ranger for Starburst Enterprise platform (SEP).

It creates the following setup:

Use your registry credentials, and follow best practices by creating an override file for changes to default values as desired.

Docker registry access#

Same as Docker image and registry section for the Helm chart for SEP.

registryCredentials:
  enabled: false
  registry:
  username:
  password:

imagePullSecrets:
 - name:

Ranger server#

The admin section configures the Ranger server and the included user interface for policy management.

admin:
  image:
    repository: "harbor.starburstdata.net/starburstdata/starburst-ranger-admin"
    tag: "2.3.0-e.4"
    pullPolicy: "IfNotPresent"
  port: 6080
  resources:
    requests:
      memory: "1Gi"
      cpu: 2
    limits:
      memory: "1Gi"
      cpu: 2
  # serviceUser is used by SEP to access Ranger
  serviceUser: "starburst_service"
  passwords:
    admin: "RangerPassword1"
    tagsync: "TagSyncPassword1"
    usersync: "UserSyncPassword1"
    keyadmin: "KeyAdminPassword1"
    service: "StarburstServicePassword1"
  # optional truststore containing CA certificates to use instead of default one
  truststore:
    # existing secret containing truststore.jks key
    secret:
    # password to truststore
    password:
  # Enable the propagation of environment variables from Secrets and Configmaps
  envFrom: []
  env:
    # Additional env variables to pass to Ranger Admin.
    # To pass Ranger install property, use variable with name RANGE__<property_name>,
    # for example RANGER__authentication_method.
  securityContext: {}
    # Optionally configure a security context for the ranger admin container

admin.serviceUser

The operating system user that is used to run the Ranger application.

admin.passwords

A number of passwords need to be set to any desired values. They are used for administrative and Ranger internal purposes and do not need to be changed or used elsewhere.

SEP user synchronization#

SEP can actively sync user names and user groups to directly Ranger as a simpler alternative to LDAP user synchronization server.`

LDAP user synchronization server#

You can use the usersync block to configure the details of the synchronization of users and groups between Ranger and your LDAP system, as alternative to direct sync between SEP and Ranger. It runs on a separate sidecar container when deployed.

The default configuration enables user synchronization:

usersync:
  enabled: true
  image:
    repository: "harbor.starburstdata.net/starburstdata/ranger-usersync"
    tag: "2.3.0-e.4"
    pullPolicy: "IfNotPresent"
  name: "ranger-usersync"
  resources:
    requests:
      memory: "1Gi"
      cpu: 1
    limits:
      memory: "1Gi"
      cpu: 1
  tls:
    # optional truststore containing CA certificate for ldap server
    truststore:
      # existing secret containing truststore.jks key
      secret:
      # password to truststore
      password:
  # Enable the propagation of environment variables from Secrets and Configmaps
  envFrom: []
  # env is a map of ranger config variables
  env:
    # Use RANGER__<property_name> variables to set Ranger install properties.
    RANGER__SYNC_LDAP_URL: "ldap://ranger-ldap:389"
    RANGER__SYNC_LDAP_BIND_DN: "cn=admin,dc=ldap,dc=example,dc=org"
    RANGER__SYNC_LDAP_BIND_PASSWORD: "cieX7moong3u"
    RANGER__SYNC_LDAP_SEARCH_BASE: "dc=ldap,dc=example,dc=org"
    RANGER__SYNC_LDAP_USER_SEARCH_BASE: "ou=users,dc=ldap,dc=example,dc=org"
    RANGER__SYNC_LDAP_USER_OBJECT_CLASS: "person"
    RANGER__SYNC_GROUP_SEARCH_ENABLED: "true"
    RANGER__SYNC_GROUP_USER_MAP_SYNC_ENABLED: "true"
    RANGER__SYNC_GROUP_SEARCH_BASE: "ou=groups,dc=ldap,dc=example,dc=org"
    RANGER__SYNC_GROUP_OBJECT_CLASS: "groupOfNames"
  securityContext:
    # Optionally configure a security context for the ranger usersync container
User synchronization configuration properties#

Node name

Description

usersync.enabled

Enables or disables user synchronization feature

usersync.name

Name of the pod

usersync.tls.truststore.secret

Name of the secret created from the truststore. This is required if you need to use tls for usersync.

usersync.tls.truststore.password

Password for the truststore. This is required if you need to use tls for usersync.

usersync.env

A map of Ranger config variables related to the user synchronization

usersync.env.RANGER__SYNC_LDAP_URL

URL to the LDAP server

usersync.env.RANGER__SYNC_LDAP_BIND_DN

Distinguished name (DN) string used to bind for the LDAP connection

usersync.env.RANGER__SYNC_LDAP_BIND_PASSWORD

usersync.env.RANGER__SYNC_LDAP_SEARCH_BASE

usersync.env.RANGER__SYNC_LDAP_USER_SEARCH_BASE

User information search base in the LDAP directory

usersync.env.RANGER__SYNC_LDAP_USER_OBJECT_CLASS

Object class for users

usersync.env.RANGER__SYNC_GROUP_SEARCH_ENABLED

Enable or disable group search

usersync.env.RANGER__SYNC_GROUP_USER_MAP_SYNC_ENABLED

Enable or disable synchronization of group-user mapping

usersync.env.RANGER__SYNC_GROUP_SEARCH_BASE

Group information search base in the LDAP directory

usersync.env.RANGER__SYNC_GROUP_OBJECT_CLASS

Object class for groups, typically groupOfNames for OpenLDAP or group for Active Directory

The following steps can be used to enable TLS with the LDAP server:

  • Create a truststore file named truststore.jks from the LDAP server

  • Create a Kubernetes secret ldap-cert from the truststore file

    kubectl create secret generic ldap-cert --from-file truststore.jks
    
  • Update values to reflect the secret name in the tls section

  • Update truststore password in the tls section

    tls:
      enabled: true
      truststore:
        secret: ldap-cert
        password: "truststore password"
    

Internal backing database server#

You can use a PostgreSQL database located within the cluster, created by the chart, as backend for the policy storage of Ranger in the database block for testing.

Note

Alternatively, you can use an external PostgreSQL database for production usage that you must manage yourself.

This section describes YAML nodes provided by default for configuring the internal backing database:

database:
  type: "internal"
  internal:
    image:
      repository: "library/postgres"
      tag: "10.6"
      pullPolicy: "IfNotPresent"
    volume:
      persistentVolumeClaim:
        storageClassName:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: "2Gi"
    resources:
      requests:
        memory: "1Gi"
        cpu: 2
      limits:
        memory: "1Gi"
        cpu: 2
    port: 5432
    databaseName: "ranger"
    databaseUser: "ranger"
    databasePassword: "RangerPass123"
    databaseRootUser: "rangeradmin"
    databaseRootPassword: "RangerAdminPass123"
    securityContext: {}
    envFrom: []
    env: []
Internal backing database server configuration properties#

Node name

Description

database.type

Set to internal to use a database in the k8s cluster, managed by the chart

database.internal.image

Docker container images used for the PostgreSQL server

database.internal.volume

Storage volume to persist the database. The default configuration requests a new persistent volume (PV).

database.internal.volume.persistentVolumeClaim

The default configuration, which requests a new persistent volume (PV).

database.internal.volume.existingVolumeClaim

Alternative volume configuration, which use existing volume claim by referencing the name as the value in quotes, e.g., "my_claim".

database.internal.volume.emptyDir

Alternative volume configuration, which configures an empty directory on the pod, keeping in mind that a pod replacement loses the database content.

database.internal.resources

database.internal.databaseName

Name of the internal database

database.internal.databaseUser

User to connect to the internal database

database.internal.databasePassword

Password to connect to internal database

database.internal.databaseRootUser

User to administrate the internal database for creating and updating tables and similar operations.

database.internal.databaseRootPassword

Password for the administrator to connect to the the internal database

database.internal.envFrom

YAML sequence of mappings to define Secret or Configmap as a source of environment variables for the internal PostgreSQL container.

database.internal.env

YAML sequence of mappings to define two keys environment variables for the internal PostgreSQL container.

Examples#

OpenShift deployments often do not have access to pull from the default Docker registry library/postgres. You can replace it with an image from the Red Hat registry, which requires additional environment variables set with the parameter database.internal.env:

database:
  type: internal
  internal:
    image:
       repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
       tag: "latest"
    env:
      - name: POSTGRESQL_DATABASE
        value: "hive"
      - name: POSTGRESQL_USER
        value: "hive"
      - name: POSTGRESQL_PASSWORD
        value: "HivePass1234"

Another option is to create a Secret (ex. postgresql-secret) containing variables needed by postgresql mentioned in previous code block, and pass it to the container with envFrom parameter:

database:
  type: internal
  internal:
    image:
       repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
       tag: "latest"
    envFrom:
      - secretRef:
          name: postgresql-secret

External backing database server#

This section shows the empty default setup for using of an external PostgreSQL database. You must provide the necessary details for the external server and ensure that it can be reached from the k8s cluster pod.

database:
  type: "external"
  external:
    port:
    host:
    databaseName:
    databaseUser:
    databasePassword:
    databaseRootUser:
    databaseRootPassword:
External backing database server configuration properties#

Node name

Description

database.type

Set to external to use a database managed externally

database.external / port

Port to access the external database

database.external / host

Host of the external database

database.external / databaseName

Name of the database

database.external / databaseUser

User to connect to the database. If the user does not already exist, it is automatically created during installation.

database.external / databasePassword

Password to connect to the database

database.external / databaseRootUser

The existing root user to administrate the external database. It is used to create and update tables and similar operations.

database.external / databaseRootPassword

Password for the administrator to connect to the external database

Additional volumes#

Additional volumes can be necessary for storing and accessing persisted files. They can be defined in the additionalVolumes section. None are defined by default:

additionalVolumes: []

You can add one or more volumes supported by k8s, to all nodes in the cluster.

If you specify path only, a directory named in path is created. When mounting ConfigMap or Secret, files are created in this directory for each key.

This supports an optional subPath parameter which takes in an optional key in the ConfigMap or Secret volume you create. If you specify subPath, a specific key named subPath from ConfigMap or Secret is mounted as a file with the name provided by path.

The following example snippet shows both use cases:

additionalVolumes:
  - path: /mnt/InContainer
    volume:
      emptyDir: {}
  - path: /tmp/config.txt
    subPath: config.txt
    volume:
      configMap:
        name: "configmap-in-volume"

Exposing the cluster to outside network#

The expose section for Ranger works identical to the expose section for SEP. It exposes the Ranger user interface for configuring and managing policies outside the cluster.

Differences are isolated to the configured default values. The default type is clusterIp:

expose:
  type: "clusterIp"
  clusterIp:
    name: "ranger"
    ports:
      http:
        port: 6080

The following section shows the default values with an activated nodePort type:

expose:
  type: "nodePort"
  nodePort:
    name: "ranger"
    ports:
      http:
        port: 6080
        nodePort: 30680

The following section shows the default values with an activated loadBalancer type:

expose:
  type: "loadBalancer"
  loadBalancer:
    name: "ranger"
    IP: ""
    ports:
      http:
        port: 6080
    annotations: {}
    sourceRanges: []

The following section shows the default values with an activated ingress type:

expose:
  type: "ingress"
  ingress:
    ingressName: "ranger-ingress"
    serviceName: "ranger"
    servicePort: 6080
    ingressClassName:
    tls:
      enabled: true
      secretName:
    host:
    path: "/"
    annotations: {}

Datasources#

# datasources - list of SEP datasources to configure Ranger # services. It is mounted as file /config/datasources.yaml inside # container and processed by init script.

datasources:
  - name: "fake-starburst-1"
    host: "starburst.fake-starburst-1-namespace"
    port: 8080
    username: "starburst_service1"
    password: "Password123"
  - name: "fake-starburst-2"
    host: "starburst.fake-starburst-2-namespace"
    port: 8080
    username: "starburst_service2"
    password: Password123

Server start up configuration#

You can create a startup shell script to customize how Ranger is started, and pass additional arguments to it.

The script receives the container name as input parameter. Possible values are ranger-admin and ranger-usersync. Additional arguments can be configured with extraArguments.

Startup script nodes#

Node name

Description

initFile

A shell script to run before Ranger is launched. The content of the file has to be an inline string in the YAML file. The script is started as /bin/bash <<init_file>>. When called, it is passed the single parameter value ranger-admin or ranger-usersync depending on the type of pod. The script needs to invoke the launcher script /init/initFile.sh for a successful start of Ranger.

extraArguments

List of extra arguments to be passed to the initFile script.

Extra secrets#

You can configure additional secrets that are mounted in the /extra-secret/ path on each container.

extraSecret:
  # Replace this with secret name that should be used from namespace you are deploying to
  name:
  # Optionally 'file' may be provided which will be deployed as secret with given 'name' in used namespace.
  file:

Node assignment#

You can configure your cluster to determine the node and pod to use for the Ranger server:

nodeSelector: {}
tolerations: []
affinity: {}

Our SEP configuration documentation contains examples and resources to help you configure these YAML nodes.

Annotations#

You can add configuration to annotate the deployment and pod:

deploymentAnnotations: {}
podAnnotations: {}

Security context#

You can optionally configure security contexts to define privilege and access control settings for the Ranger containers. You can separately configure the security context for Ranger admin, usersync and database containers.

securityContext:

If you do not want to set the serviceContext for the default service account, you can restrict it by configuring the service account for the Ranger pod.

For a restricted environment like OpenShift, you may need to set the AUDIT_WRITE capability for Ranger usersync:

securityContext:
  capabilities:
    add:
      - AUDIT_WRITE

Additionally OpenShift clusters, need anyuid and privileged security context constraints set for the service account used by Ranger. For example:

oc create serviceaccount <k8s-service-account>
oc adm policy add-scc-to-user anyuid system:serviceaccount:<k8s-namespace>:<k8s-service-account>

Service account#

You can configure a service account for the Ranger pod using:

serviceAccountName:

Environment variables#

You can pass environment variables to the Ranger container using the same mechanism used for the internal database:

envFrom: []
env: []

Both are specified as a mapping sequences for example:

envFrom:
  - secretRef:
      name: my-secret-with-vars
env:
  - name: MY_VARIABLE
    value: some-value

Enable TLS#

The process of enabling TLS between Ranger and SEP with the Helm chart is identical to the normal process. The keystore files have to be added as additional files.

The configuration in the YAML file has to use the coordinator nodes mechanism to add a properties file, inline the XML configuration file that references the keystore files, and add additional properties to config.properties:

coordinator:
  etcFiles:
    properties:
      access-control-ranger.properties: |
        access-control.name=ranger
        ...
    other:
      ranger-policymgr-ssl.xml: |
        <configuration>
        ...
  additionalProperties: |
    access-control.config-files=/etc/starburst/access-control-ranger.properties

Add the SSL config file path to all catalogs in the Catalogs node using the default path and configured filename:

catalogs:
  examplecatalog: |
    connector.name=...
    ...
    ranger.plugin-policy-ssl-config-file=/etc/starburst/ranger-policymgr-ssl.xml

TLS Encryption with Ranger#

If your organization requires network traffic to be encrypted to the Ranger pod rather than terminating TLS at the load balancer, you must configure the Ranger values.yaml file as shown in the following example:

expose:
  type: "loadBalancer"
  loadBalancer:
    name: "ranger-lb"
    ports:
      http:
        port: 6182

admin:
  port: 6182
  probes:
    readinessProbe:
      httpGet:
        path: /
        port: 6182
        scheme: HTTPS
      failureThreshold: 10
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /
        port: 6182
        scheme: HTTPS
  env:
    RANGER__policymgr_http_enabled: false
    RANGER__policymgr_external_url: https://localhost:6182
    RANGER__policymgr_https_keystore_file: /tmp/rangercert/ranger-admin-keystore.jks
    RANGER__policymgr_https_keystore_keyalias: ranger-admin.docker.cluster
    RANGER__policymgr_https_keystore_password: password

additionalVolumes:
  - path: /tmp/rangercert
    volume:
      secret:
        secretName: "ranger-admin-keystore"

The following table explains the relevant YAML sections:

Ranger YAML configuration sections for TLS#

YAML section

Purpose

expose

Configures Ranger to deploy its own load balancer, encrypting network traffic to the Ranger pod. Without this section, TLS encrypted network traffic intended for Ranger is decrypted before reaching Ranger.

admin

Declares the TLS configuration for Ranger’s administration container that processes external traffic. The port must be configured to use the Ranger TLS port number, 6182.

admin.probes

Redefines the readiness and liveness probes https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes to use the https protocol, and to use port 6182.

admin.env

Defines several environment variables used by Ranger:

  • RANGER__policymgr_http_enabled - Must be set to false to enable TLS.

  • RANGER__policymgr_external_url - Defines the URL that the admin policy manager container will listen for traffic on, and must specify port 6182 as part of the URL.

  • RANGER__policymgr_https_keystore_* - These settings define the details of the Java keystore file that is to be used for TLS encryption, including the name of the file used when making the keystore, the keystore alias name, and the keystore’s password.

additionalVolumes

Declares the k8s volume containing the Java keystore for Ranger to use. This must be deployed to the k8s cluster as a secret whose name is given in the secretName setting. The RANGER__policymgr_https_keystore_file setting must match the path in this section. The key name given the file must match the value given when creating the secret.

More information about working with Java keystore certificates is available in the JKS files and PEM files documentation.

Configure startup probe timeouts#

You can define startup probe timeouts for the Ranger Admin as well as the user synchronization container. This allows the applications to connect to the database backend and complete starting the application. The default setting of 300 seconds is the result of failureThreshold * periodSeconds and the default values of 30 seconds and 10 seconds.

admin:
  startupProbe:
    failureThreshold: 30
    periodSeconds: 10

usersync:
  startupProbe:
    failureThreshold: 30
    periodSeconds: 10

The default is sufficient for a database backend running locally within the cluster due to the low latency between the services. If the database is operating externally you can change these timeout values to adjust for the reduced network latency, and the resulting slower startup, to avoid startup failures. For example, you can raise the value to ten minutes for both containers:

admin:
  startupProbe:
    failureThreshold: 60
    periodSeconds: 10

usersync:
  startupProbe:
    failureThreshold: 60
    periodSeconds: 10