Configuring Starburst Enterprise with Ranger in Kubernetes#
Looking for the installation guide? This topic covers configuring the Starburst cache service after you completed a basic install of Starburst Enterprise. If you have not yet completed the basic install, go to our installation guide. |
The starburst-ranger
Helm chart configures Apache Ranger 2.3.0 usage in the
cluster with the values.yaml
file detailed in the following sections. It
allows you to implement global access control
or just Hive access control with Ranger for
Starburst Enterprise platform (SEP).
It creates the following setup:
A pod with the Ranger server with the Ranger plugin deployed and configured to connect to SEP
Optionally a container in the pod with Ranger LDAP user synchronization system deployed and configured
Optionally a container in the pod with a PostgreSQL database backend for Ranger
Use your registry credentials, and follow best practices by creating an override file for changes to default values as desired.
Docker registry access#
Same as Docker image and registry section for the Helm chart for SEP.
registryCredentials:
enabled: false
registry:
username:
password:
imagePullSecrets:
- name:
Ranger server#
The admin
section configures the Ranger server and the included user
interface for policy management.
admin:
image:
repository: "harbor.starburstdata.net/starburstdata/starburst-ranger-admin"
tag: "2.3.0-e.4"
pullPolicy: "IfNotPresent"
port: 6080
resources:
requests:
memory: "1Gi"
cpu: 2
limits:
memory: "1Gi"
cpu: 2
# serviceUser is used by SEP to access Ranger
serviceUser: "starburst_service"
passwords:
admin: "RangerPassword1"
tagsync: "TagSyncPassword1"
usersync: "UserSyncPassword1"
keyadmin: "KeyAdminPassword1"
service: "StarburstServicePassword1"
# optional truststore containing CA certificates to use instead of default one
truststore:
# existing secret containing truststore.jks key
secret:
# password to truststore
password:
# Enable the propagation of environment variables from Secrets and Configmaps
envFrom: []
env:
# Additional env variables to pass to Ranger Admin.
# To pass Ranger install property, use variable with name RANGE__<property_name>,
# for example RANGER__authentication_method.
securityContext: {}
# Optionally configure a security context for the ranger admin container
admin.serviceUser
The operating system user that is used to run the Ranger application.
admin.passwords
A number of passwords need to be set to any desired values. They are used for administrative and Ranger internal purposes and do not need to be changed or used elsewhere.
SEP user synchronization#
SEP can actively sync user names and user groups to directly Ranger as a simpler alternative to LDAP user synchronization server.`
LDAP user synchronization server#
You can use the usersync
block to configure the details of the
synchronization of users and groups between Ranger and your LDAP system, as
alternative to direct sync between SEP and Ranger.
It runs on a separate sidecar container when deployed.
The default configuration enables user synchronization:
usersync:
enabled: true
image:
repository: "harbor.starburstdata.net/starburstdata/ranger-usersync"
tag: "2.3.0-e.4"
pullPolicy: "IfNotPresent"
name: "ranger-usersync"
resources:
requests:
memory: "1Gi"
cpu: 1
limits:
memory: "1Gi"
cpu: 1
tls:
# optional truststore containing CA certificate for ldap server
truststore:
# existing secret containing truststore.jks key
secret:
# password to truststore
password:
# Enable the propagation of environment variables from Secrets and Configmaps
envFrom: []
# env is a map of ranger config variables
env:
# Use RANGER__<property_name> variables to set Ranger install properties.
RANGER__SYNC_LDAP_URL: "ldap://ranger-ldap:389"
RANGER__SYNC_LDAP_BIND_DN: "cn=admin,dc=ldap,dc=example,dc=org"
RANGER__SYNC_LDAP_BIND_PASSWORD: "cieX7moong3u"
RANGER__SYNC_LDAP_SEARCH_BASE: "dc=ldap,dc=example,dc=org"
RANGER__SYNC_LDAP_USER_SEARCH_BASE: "ou=users,dc=ldap,dc=example,dc=org"
RANGER__SYNC_LDAP_USER_OBJECT_CLASS: "person"
RANGER__SYNC_GROUP_SEARCH_ENABLED: "true"
RANGER__SYNC_GROUP_USER_MAP_SYNC_ENABLED: "true"
RANGER__SYNC_GROUP_SEARCH_BASE: "ou=groups,dc=ldap,dc=example,dc=org"
RANGER__SYNC_GROUP_OBJECT_CLASS: "groupOfNames"
securityContext:
# Optionally configure a security context for the ranger usersync container
Node name |
Description |
---|---|
|
Enables or disables user synchronization feature |
|
Name of the pod |
|
Name of the secret created from the truststore. This is required if you need to use tls for usersync. |
|
Password for the truststore. This is required if you need to use tls for usersync. |
|
A map of Ranger config variables related to the user synchronization |
|
URL to the LDAP server |
|
Distinguished name (DN) string used to bind for the LDAP connection |
|
|
|
|
|
User information search base in the LDAP directory |
|
Object class for users |
|
Enable or disable group search |
|
Enable or disable synchronization of group-user mapping |
|
Group information search base in the LDAP directory |
|
Object class for groups, typically |
The following steps can be used to enable TLS with the LDAP server:
Create a truststore file named
truststore.jks
from the LDAP serverCreate a Kubernetes secret
ldap-cert
from the truststore filekubectl create secret generic ldap-cert --from-file truststore.jks
Update values to reflect the secret name in the
tls
sectionUpdate truststore password in the
tls
sectiontls: enabled: true truststore: secret: ldap-cert password: "truststore password"
Internal backing database server#
You can use a PostgreSQL database located within the cluster, created by the
chart, as backend for the policy storage of Ranger in the database
block for
testing.
Note
Alternatively, you can use an external PostgreSQL database for production usage that you must manage yourself.
This section describes YAML nodes provided by default for configuring the internal backing database:
database:
type: "internal"
internal:
image:
repository: "library/postgres"
tag: "10.6"
pullPolicy: "IfNotPresent"
volume:
persistentVolumeClaim:
storageClassName:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
resources:
requests:
memory: "1Gi"
cpu: 2
limits:
memory: "1Gi"
cpu: 2
port: 5432
databaseName: "ranger"
databaseUser: "ranger"
databasePassword: "RangerPass123"
databaseRootUser: "rangeradmin"
databaseRootPassword: "RangerAdminPass123"
securityContext: {}
envFrom: []
env: []
Node name |
Description |
---|---|
|
Set to |
|
Docker container images used for the PostgreSQL server |
|
Storage volume to persist the database. The default configuration requests a new persistent volume (PV). |
|
The default configuration, which requests a new persistent volume (PV). |
|
Alternative volume configuration, which use existing volume claim by
referencing the name as the value in quotes, e.g., |
|
Alternative volume configuration, which configures an empty directory on the pod, keeping in mind that a pod replacement loses the database content. |
|
|
|
Name of the internal database |
|
User to connect to the internal database |
|
Password to connect to internal database |
|
User to administrate the internal database for creating and updating tables and similar operations. |
|
Password for the administrator to connect to the the internal database |
|
YAML sequence of mappings to define Secret or Configmap as a source of environment variables for the internal PostgreSQL container. |
|
YAML sequence of mappings to define two keys environment variables for the internal PostgreSQL container. |
Examples#
OpenShift deployments often do not have access to pull from the default Docker
registry library/postgres
. You can replace it with an image from the Red Hat
registry, which requires additional environment variables set with the parameter
database.internal.env
:
database:
type: internal
internal:
image:
repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
tag: "latest"
env:
- name: POSTGRESQL_DATABASE
value: "hive"
- name: POSTGRESQL_USER
value: "hive"
- name: POSTGRESQL_PASSWORD
value: "HivePass1234"
Another option is to create a Secret (ex. postgresql-secret
) containing
variables needed by postgresql
mentioned in previous code block, and pass it
to the container with envFrom
parameter:
database:
type: internal
internal:
image:
repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
tag: "latest"
envFrom:
- secretRef:
name: postgresql-secret
External backing database server#
This section shows the empty default setup for using of an external PostgreSQL database. You must provide the necessary details for the external server and ensure that it can be reached from the k8s cluster pod.
database:
type: "external"
external:
port:
host:
databaseName:
databaseUser:
databasePassword:
databaseRootUser:
databaseRootPassword:
Node name |
Description |
---|---|
|
Set to |
|
Port to access the external database |
|
Host of the external database |
|
Name of the database |
|
User to connect to the database. If the user does not already exist, it is automatically created during installation. |
|
Password to connect to the database |
|
The existing root user to administrate the external database. It is used to create and update tables and similar operations. |
|
Password for the administrator to connect to the external database |
Additional volumes#
Additional volumes can be necessary for storing and accessing persisted files.
They can be defined in the additionalVolumes
section. None are defined by
default:
additionalVolumes: []
You can add one or more volumes supported by k8s, to all nodes in the cluster.
If you specify path
only, a directory named in path
is created. When
mounting ConfigMap or Secret, files are created in this directory for each key.
This supports an optional subPath
parameter which takes in an optional
key in the ConfigMap or Secret volume you create. If you specify subPath
, a
specific key named subPath
from ConfigMap or Secret is mounted as a file
with the name provided by path
.
The following example snippet shows both use cases:
additionalVolumes:
- path: /mnt/InContainer
volume:
emptyDir: {}
- path: /tmp/config.txt
subPath: config.txt
volume:
configMap:
name: "configmap-in-volume"
Exposing the cluster to outside network#
The expose
section for Ranger works identical to the expose section for SEP. It exposes the Ranger user interface for
configuring and managing policies outside the cluster.
Differences are isolated to the configured default values. The default type is
clusterIp
:
expose:
type: "clusterIp"
clusterIp:
name: "ranger"
ports:
http:
port: 6080
The following section shows the default values with an activated nodePort
type:
expose:
type: "nodePort"
nodePort:
name: "ranger"
ports:
http:
port: 6080
nodePort: 30680
The following section shows the default values with an activated
loadBalancer
type:
expose:
type: "loadBalancer"
loadBalancer:
name: "ranger"
IP: ""
ports:
http:
port: 6080
annotations: {}
sourceRanges: []
The following section shows the default values with an activated ingress
type:
expose:
type: "ingress"
ingress:
ingressName: "ranger-ingress"
serviceName: "ranger"
servicePort: 6080
ingressClassName:
tls:
enabled: true
secretName:
host:
path: "/"
annotations: {}
Datasources#
# datasources - list of SEP datasources to configure Ranger # services. It is mounted as file /config/datasources.yaml inside # container and processed by init script.
datasources:
- name: "fake-starburst-1"
host: "starburst.fake-starburst-1-namespace"
port: 8080
username: "starburst_service1"
password: "Password123"
- name: "fake-starburst-2"
host: "starburst.fake-starburst-2-namespace"
port: 8080
username: "starburst_service2"
password: Password123
Server start up configuration#
You can create a startup shell script to customize how Ranger is started, and pass additional arguments to it.
The script receives the container name as input parameter. Possible values are
ranger-admin
and ranger-usersync
. Additional arguments can be
configured with extraArguments
.
Node name |
Description |
---|---|
|
A shell script to run before Ranger is launched. The content of the file
has to be an inline string in the YAML file. The script is started as
|
|
List of extra arguments to be passed to the |
Extra secrets#
You can configure additional secrets that are mounted in the /extra-secret/
path on each container.
extraSecret:
# Replace this with secret name that should be used from namespace you are deploying to
name:
# Optionally 'file' may be provided which will be deployed as secret with given 'name' in used namespace.
file:
Node assignment#
You can configure your cluster to determine the node and pod to use for the Ranger server:
nodeSelector: {}
tolerations: []
affinity: {}
Our SEP configuration documentation contains examples and resources to help you configure these YAML nodes.
Annotations#
You can add configuration to annotate the deployment and pod:
deploymentAnnotations: {}
podAnnotations: {}
Security context#
You can optionally configure security contexts to define privilege and access control settings for the Ranger containers. You can separately configure the security context for Ranger admin, usersync and database containers.
securityContext:
If you do not want to set the serviceContext for the default
service account, you can restrict it by configuring the service account for the Ranger pod.
For a restricted environment like OpenShift, you may need to set the
AUDIT_WRITE
capability for Ranger usersync:
securityContext:
capabilities:
add:
- AUDIT_WRITE
Additionally OpenShift clusters, need anyuid
and privileged
security context constraints set for the service
account used by Ranger. For example:
oc create serviceaccount <k8s-service-account>
oc adm policy add-scc-to-user anyuid system:serviceaccount:<k8s-namespace>:<k8s-service-account>
Service account#
You can configure a service account for the Ranger pod using:
serviceAccountName:
Environment variables#
You can pass environment variables to the Ranger container using the same mechanism used for the internal database:
envFrom: []
env: []
Both are specified as a mapping sequences for example:
envFrom:
- secretRef:
name: my-secret-with-vars
env:
- name: MY_VARIABLE
value: some-value
Enable TLS#
The process of enabling TLS between Ranger and SEP with the Helm chart is identical to the normal process. The keystore files have to be added as additional files.
The configuration in the YAML file has to use the coordinator nodes mechanism to add a properties file, inline the XML
configuration file that references the keystore files, and add additional
properties to config.properties
:
coordinator:
etcFiles:
properties:
access-control-ranger.properties: |
access-control.name=ranger
...
other:
ranger-policymgr-ssl.xml: |
<configuration>
...
additionalProperties: |
access-control.config-files=/etc/starburst/access-control-ranger.properties
Add the SSL config file path to all catalogs in the Catalogs node using the default path and configured filename:
catalogs:
examplecatalog: |
connector.name=...
...
ranger.plugin-policy-ssl-config-file=/etc/starburst/ranger-policymgr-ssl.xml
TLS Encryption with Ranger#
If your organization requires network traffic to be encrypted to the Ranger pod
rather than terminating TLS at the load balancer, you must configure the Ranger
values.yaml
file as shown in the following example:
expose:
type: "loadBalancer"
loadBalancer:
name: "ranger-lb"
ports:
http:
port: 6182
admin:
port: 6182
probes:
readinessProbe:
httpGet:
path: /
port: 6182
scheme: HTTPS
failureThreshold: 10
periodSeconds: 10
livenessProbe:
httpGet:
path: /
port: 6182
scheme: HTTPS
env:
RANGER__policymgr_http_enabled: false
RANGER__policymgr_external_url: https://localhost:6182
RANGER__policymgr_https_keystore_file: /tmp/rangercert/ranger-admin-keystore.jks
RANGER__policymgr_https_keystore_keyalias: ranger-admin.docker.cluster
RANGER__policymgr_https_keystore_password: password
additionalVolumes:
- path: /tmp/rangercert
volume:
secret:
secretName: "ranger-admin-keystore"
The following table explains the relevant YAML sections:
YAML section |
Purpose |
---|---|
|
Configures Ranger to deploy its own load balancer, encrypting network traffic to the Ranger pod. Without this section, TLS encrypted network traffic intended for Ranger is decrypted before reaching Ranger. |
|
Declares the TLS configuration for Ranger’s administration container that processes external traffic. The port must be configured to use the Ranger TLS port number, 6182. |
|
Redefines the readiness and liveness probes https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes to use the https protocol, and to use port 6182. |
|
Defines several environment variables used by Ranger:
|
|
Declares the k8s volume containing the Java keystore for Ranger to use.
This must be deployed to the k8s cluster as a secret
whose name is given in the |
More information about working with Java keystore certificates is available in the JKS files and PEM files documentation.
Configure startup probe timeouts#
You can define startup probe timeouts
for the Ranger Admin as well as the user synchronization container. This allows
the applications to connect to the database backend and complete starting the
application. The default setting of 300 seconds is the result of
failureThreshold * periodSeconds
and the default values of 30 seconds and 10
seconds.
admin:
startupProbe:
failureThreshold: 30
periodSeconds: 10
usersync:
startupProbe:
failureThreshold: 30
periodSeconds: 10
The default is sufficient for a database backend running locally within the cluster due to the low latency between the services. If the database is operating externally you can change these timeout values to adjust for the reduced network latency, and the resulting slower startup, to avoid startup failures. For example, you can raise the value to ten minutes for both containers:
admin:
startupProbe:
failureThreshold: 60
periodSeconds: 10
usersync:
startupProbe:
failureThreshold: 60
periodSeconds: 10