Configuring Starburst Enterprise with Ranger in Kubernetes#
The starburst-ranger
Helm chart configures Apache Ranger 2.5.0 usage in the
cluster with the values.yaml
file detailed in the following sections. It
allows you to implement global access control
or just Hive access control with Ranger for
Starburst Enterprise platform (SEP).
It creates the following setup:
A pod with the Ranger server with the Ranger plugin deployed and configured to connect to SEP
Optionally a container in the pod with Ranger LDAP user synchronization system deployed and configured
Optionally a container in the pod with a PostgreSQL database backend for Ranger
Use your registry credentials, and follow best practices by creating an override file for changes to default values as desired.
Before you begin#
Get the latest starburst-ranger
Helm chart as described in our
Kubernetes guide with the configured registry access.
This topic assumes you are familiar with Ranger, as well as with Helm charts and
Kubernetes (k8s) tools such as kubectl
. Ensure that you are familiar with
the following Starburst Enterprise Kubernetes topics before configuring and
deploying Ranger:
Configure Ranger#
There are several top-level nodes in the Ranger Helm chart that you must modify for a minimum Ranger configuration:
admin
usersync
database
expose
If you are using TLS, this must also be considered. This section covers getting started with these four configuration steps. It also provides details about the content of the Ranger Helm chart.
As with SEP, we strongly suggest that you initially deploy Ranger with the minimum configuration described in this topic, and ensure that it deploys and is accessible before making any additional customizations described in our reference documentation.
Note
Store customizations in a separate file containing only changed values as
recommended in our best practices In this
topic, customizations are stored in a file named ranger-values.yaml
that
is used in the helm upgrade
command.
Configure the Ranger administration container#
The following values must be defined in the admin
node of the Ranger Helm
chart:
CPU resources for requests and limits: The defaults are sufficient for most environments; however, they must work with the instance type you are using.
Memory resources for requests and limits: The defaults are sufficient for most environments; however, they must work with the instance type you are using.
Passwords: You must supply all passwords in the
passwords
node.
The admin
section configures the Ranger server and the included user
interface for policy management.
admin:
image:
repository: "harbor.starburstdata.net/starburstdata/starburst-ranger-admin"
tag: "2.5.0-e.4"
pullPolicy: "IfNotPresent"
port: 6080
resources:
requests:
memory: "1Gi"
cpu: 2
limits:
memory: "1Gi"
cpu: 2
# serviceUser is used by SEP to access Ranger
serviceUser: "starburst_service"
passwords:
admin: "RangerPassword1"
tagsync: "TagSyncPassword1"
usersync: "UserSyncPassword1"
keyadmin: "KeyAdminPassword1"
service: "StarburstServicePassword1"
# optional truststore containing CA certificates to use instead of default one
truststore:
# existing secret containing truststore.jks key
secret:
# password to truststore
password:
# Enable the propagation of environment variables from Secrets and Configmaps
envFrom: []
env:
# Additional env variables to pass to Ranger Admin.
# To pass Ranger install property, use variable with name RANGE__<property_name>,
# for example RANGER__authentication_method.
securityContext: {}
# Optionally configure a security context for the ranger admin container
admin.serviceUser
The operating system user that is used to run the Ranger application.
admin.passwords
A number of passwords need to be set to any desired values. They are used for administrative and Ranger internal purposes and do not need to be changed or used elsewhere.
Configure Ranger user synchronization#
User synchronization automates the process of adding users to Ranger for policy enforcement by allowing the synchronization of users and groups from LDAP, including Active Directory.
You can use the usersync
block to configure the details of the
synchronization of users and groups between Ranger and your LDAP system, as
alternative to direct sync between SEP and Ranger.
It runs on a separate sidecar container when deployed.
At a minimum, the env
properties in the top-level usersync
node must be
defined correctly for your environment. The default configuration enables user
synchronization:
usersync:
enabled: true
env:
# Use RANGER__<property_name> variables to set Ranger install properties.
RANGER__SYNC_GROUP_OBJECT_CLASS: groupOfNames
RANGER__SYNC_GROUP_SEARCH_BASE: ou=groups,dc=ldap,dc=example,dc=org
RANGER__SYNC_GROUP_SEARCH_ENABLED: "true"
RANGER__SYNC_GROUP_USER_MAP_SYNC_ENABLED: "true"
RANGER__SYNC_LDAP_BIND_DN: cn=admin,dc=ldap,dc=example,dc=org
RANGER__SYNC_LDAP_BIND_PASSWORD: p@ssw0rd!
RANGER__SYNC_LDAP_SEARCH_BASE: dc=ldap,dc=example,dc=org
RANGER__SYNC_LDAP_URL: ldap://ranger-ldap:389
RANGER__SYNC_LDAP_USER_OBJECT_CLASS: person
RANGER__SYNC_LDAP_USER_SEARCH_BASE: ou=users,dc=ldap,dc=example,dc=org
securityContext:
# Optionally configure a security context for the ranger usersync container
Node name |
Description |
---|---|
|
Enables or disables user synchronization feature |
|
Name of the pod |
|
Name of the secret created from the truststore. This is required if you need to use tls for usersync. |
|
Password for the truststore. This is required if you need to use tls for usersync. |
|
A map of Ranger config variables related to the user synchronization |
|
URL to the LDAP server |
|
Distinguished name (DN) string used to bind for the LDAP connection |
|
|
|
|
|
User information search base in the LDAP directory |
|
Object class for users |
|
Enable or disable group search |
|
Enable or disable synchronization of group-user mapping |
|
Group information search base in the LDAP directory |
|
Object class for groups, typically |
The following steps can be used to enable TLS with the LDAP server:
Create a truststore file named
truststore.jks
from the LDAP serverCreate a Kubernetes secret
ldap-cert
from the truststore filekubectl create secret generic ldap-cert --from-file truststore.jks
Update values to reflect the secret name in the
tls
sectionUpdate truststore password in the
tls
sectiontls: enabled: true truststore: secret: ldap-cert password: "truststore password"
SEP user synchronization#
SEP can actively sync user names and user groups to directly Ranger as a simpler alternative to Configure Ranger user synchronization.`
Configure the PostgreSQL backend database#
The configuration properties for the internal PostgreSQL backend database that
stores policy information are found in the database
top-level node.
Note
Alternatively, you can use an external PostgreSQL database for production usage that you must manage yourself.
As a minimal configuation, you must ensure that the following are set correctly for your environment:
database:
type: "internal"
internal:
port: 5432
databaseName: "ranger"
databaseUser: "ranger"
databasePassword: "RangerPass123"
databaseRootUser: "rangeradmin"
databaseRootPassword: "RangerAdminPass123"
You may also configure volume
persistence and resources, as well as the
resources
for the backing database itself in the database
node.
Node name |
Description |
---|---|
|
Set to |
|
Docker container images used for the PostgreSQL server |
|
Storage volume to persist the database. The default configuration requests a new persistent volume (PV). |
|
The default configuration, which requests a new persistent volume (PV). |
|
Alternative volume configuration, which use existing volume claim by
referencing the name as the value in quotes, e.g., |
|
Alternative volume configuration, which configures an empty directory on the pod, keeping in mind that a pod replacement loses the database content. |
|
|
|
Name of the internal database |
|
User to connect to the internal database |
|
Password to connect to internal database |
|
User to administrate the internal database for creating and updating tables and similar operations. |
|
Password for the administrator to connect to the the internal database |
|
YAML sequence of mappings to define Secret or Configmap as a source of environment variables for the internal PostgreSQL container. |
|
YAML sequence of mappings to define two key environment variables for the internal PostgreSQL container. |
Examples#
OpenShift deployments often do not have access to pull from the default Docker
registry library/postgres
. You can replace it with an image from the Red Hat
registry, which requires additional environment variables set with the parameter
database.internal.env
:
database:
type: internal
internal:
image:
repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
tag: "latest"
env:
- name: POSTGRESQL_DATABASE
value: "hive"
- name: POSTGRESQL_USER
value: "hive"
- name: POSTGRESQL_PASSWORD
value: "HivePass1234"
Another option is to create a Secret (ex. postgresql-secret
) containing
variables needed by postgresql
mentioned in previous code block, and pass it
to the container with envFrom
parameter:
database:
type: internal
internal:
image:
repository: "registry.redhat.io/rhscl/postgresql-96-rhel7"
tag: "latest"
envFrom:
- secretRef:
name: postgresql-secret
External backend database server#
This section shows the empty default setup for using of an external PostgreSQL database. You must provide the necessary details for the external server and ensure that it can be reached from the k8s cluster pod.
database:
type: "external"
external:
port:
host:
databaseName:
databaseUser:
databasePassword:
databaseRootUser:
databaseRootPassword:
Node name |
Description |
---|---|
|
Set to |
|
Port to access the external database |
|
Host of the external database |
|
Name of the database |
|
User to connect to the database. If the user does not already exist, it is automatically created during installation. |
|
Password to connect to the database |
|
The existing root user to administrate the external database. It is used to create and update tables and similar operations. |
|
Password for the administrator to connect to the external database |
Expose the cluster to the outside network#
The expose
section for Ranger works identical to the expose section for SEP. It exposes the Ranger user interface for
configuring and managing policies outside the cluster.
Differences are isolated to the configured default values. The default type is
clusterIp
:
expose:
type: "clusterIp"
clusterIp:
name: "ranger"
ports:
http:
port: 6080
The following section shows the default values with an activated nodePort
type:
expose:
type: "nodePort"
nodePort:
name: "ranger"
ports:
http:
port: 6080
nodePort: 30680
The following section shows the default values with an activated
loadBalancer
type:
expose:
type: "loadBalancer"
loadBalancer:
name: "ranger"
IP: ""
ports:
http:
port: 6080
annotations: {}
sourceRanges: []
The following section shows the default values with an activated ingress
type:
expose:
type: "ingress"
ingress:
ingressName: "ranger-ingress"
serviceName: "ranger"
servicePort: 6080
ingressClassName:
tls:
enabled: true
secretName:
host:
path: "/"
annotations: {}
Configure TLS (optional)#
Note
This is separate from configuring TLS in SEP itself.
If your organization uses TLS, you must enable and configure Ranger to work with it. The most straightforward way to handle TLS is to terminate TLS at the load balancer or ingress, using a signed certificate. We strongly suggest this method, which requires no additional configuration in Ranger. Ranger can also be configured to listen on HTTPS directly.
If you choose not handle TLS using those methods, you can instead configure it in the usersync and expose nodes of the Ranger Helm chart. The following snippets show the nodes of each of these sections:
usersync:
tls:
# optional truststore containing CA certificate for ldap server
truststore:
# existing secret containing truststore.jks key
secret:
# password to truststore
password:
expose:
type: "[clusterIp|nodePort|loadBalancer|ingress]"
The default expose
type is clusterIp
. However, this is not suitable for
production environments. If you need help choosing which type is best, refer to
the expose documentation for SEP.
The process of enabling TLS between Ranger and SEP with the Helm chart is identical to the normal process. The keystore files have to be added as additional files.
The configuration in the YAML file has to use the coordinator nodes mechanism to add a properties file, inline the XML
configuration file that references the keystore files, and add additional
properties to config.properties
:
coordinator:
etcFiles:
properties:
access-control-ranger.properties: |
access-control.name=ranger
...
other:
ranger-policymgr-ssl.xml: |
<configuration>
...
additionalProperties: |
access-control.config-files=/etc/starburst/access-control-ranger.properties
Add the SSL config file path to all catalogs in the Catalogs node using the default path and configured filename:
catalogs:
examplecatalog: |
connector.name=...
...
ranger.plugin-policy-ssl-config-file=/etc/starburst/ranger-policymgr-ssl.xml
TLS Encryption with Ranger#
If your organization requires network traffic to be encrypted to the Ranger pod
rather than terminating TLS at the load balancer, you must configure the Ranger
values.yaml
file as shown in the following example:
expose:
type: "loadBalancer"
loadBalancer:
name: "ranger-lb"
ports:
http:
port: 6182
admin:
port: 6182
probes:
readinessProbe:
httpGet:
path: /
port: 6182
scheme: HTTPS
failureThreshold: 10
periodSeconds: 10
livenessProbe:
httpGet:
path: /
port: 6182
scheme: HTTPS
env:
RANGER__policymgr_http_enabled: false
RANGER__policymgr_external_url: https://localhost:6182
RANGER__policymgr_https_keystore_file: /tmp/rangercert/ranger-admin-keystore.jks
RANGER__policymgr_https_keystore_keyalias: ranger-admin.docker.cluster
RANGER__policymgr_https_keystore_password: password
additionalVolumes:
- path: /tmp/rangercert
volume:
secret:
secretName: "ranger-admin-keystore"
The following table explains the relevant YAML sections:
YAML section |
Purpose |
---|---|
|
Configures Ranger to deploy its own load balancer, encrypting network traffic to the Ranger pod. Without this section, TLS encrypted network traffic intended for Ranger is decrypted before reaching Ranger. |
|
Declares the TLS configuration for Ranger’s administration container that processes external traffic. The port must be configured to use the Ranger TLS port number, 6182. |
|
Redefines the readiness and liveness probes to use the https protocol, and to use port 6182. |
|
Defines several environment variables used by Ranger:
|
|
Declares the k8s volume containing the Java keystore for Ranger to use.
This must be deployed to the k8s cluster as a
secret
whose name is given in the |
More information about working with Java keystore certificates is available in the JKS files and PEM files documentation.
Deploy Ranger#
When Ranger is configured, run the following command to deploy it. In this
example, the minimal values YAML file with the registry credentials named
registry-access.yaml
is used along with the ranger-values.yaml
containing the Ranger customizations:
$ helm upgrade ranger starburst/starburst-ranger \
--install \
--values ./registry-access.yaml \
--values ./ranger-values.yaml
Additional settings#
Server start up configuration#
You can create a startup shell script to customize how Ranger is started, and pass additional arguments to it.
The script receives the container name as input parameter. Possible values are
ranger-admin
and ranger-usersync
. Additional arguments can be
configured with extraArguments
.
Node name |
Description |
---|---|
|
A shell script to run before Ranger is launched. The content of the file
has to be an inline string in the YAML file. The script is started as
|
|
List of extra arguments to be passed to the |
Docker registry access#
Same as Docker image and registry section for the Helm chart for SEP.
registryCredentials:
enabled: false
registry:
username:
password:
imagePullSecrets:
- name:
Additional volumes#
Additional volumes can be necessary for storing and accessing persisted files.
They can be defined in the additionalVolumes
section. None are defined by
default:
additionalVolumes: []
You can add one or more volumes supported by k8s, to all nodes in the cluster.
If you specify path
only, a directory named in path
is created. When
mounting ConfigMap or Secret, files are created in this directory for each key.
This supports an optional subPath
parameter which takes in an optional
key in the ConfigMap or Secret volume you create. If you specify subPath
, a
specific key named subPath
from ConfigMap or Secret is mounted as a file
with the name provided by path
.
The following example snippet shows both use cases:
additionalVolumes:
- path: /mnt/InContainer
volume:
emptyDir: {}
- path: /tmp/config.txt
subPath: config.txt
volume:
configMap:
name: "configmap-in-volume"
Data sources#
Data sources are mounted inside the container as a file named
/config/datasources.yaml
. The file is processed by an init script.
The following YAML configuration block defines a list of SEP data sources:
datasources:
- name: "fake-starburst-1"
host: "starburst.fake-starburst-1-namespace"
port: 8080
username: "starburst_service1"
password: "Password123"
- name: "fake-starburst-2"
host: "starburst.fake-starburst-2-namespace"
port: 8080
username: "starburst_service2"
password: Password123
Extra secrets#
You can configure additional secrets that are mounted in the /extra-secret/
path on each container.
extraSecret:
# Replace this with secret name that should be used from namespace you are deploying to
name:
# Optionally 'file' may be provided which will be deployed as secret with given 'name' in used namespace.
file:
Node assignment#
You can configure your cluster to determine the node and pod to use for the Ranger server:
nodeSelector: {}
tolerations: []
affinity: {}
Our SEP configuration documentation contains examples and resources to help you configure these YAML nodes.
Annotations#
You can add configuration to annotate the deployment and pod:
deploymentAnnotations: {}
podAnnotations: {}
Security context#
You can optionally configure security contexts to define privilege and access control settings for the Ranger containers. You can separately configure the security context for Ranger admin, usersync and database containers.
securityContext:
If you do not want to set the serviceContext for the default
service account, you can restrict it by configuring the service account for the Ranger pod.
For a restricted environment like OpenShift, you may need to set the
AUDIT_WRITE
capability for Ranger usersync:
securityContext:
capabilities:
add:
- AUDIT_WRITE
Additionally OpenShift clusters, need anyuid
and privileged
security context constraints set for the service account used by Ranger. For example:
oc create serviceaccount <k8s-service-account>
oc adm policy add-scc-to-user anyuid system:serviceaccount:<k8s-namespace>:<k8s-service-account>
Service account#
You can configure a service account for the Ranger pod using:
serviceAccountName:
Environment variables#
You can pass environment variables to the Ranger container using the same mechanism used for the internal database:
envFrom: []
env: []
Both are specified as a mapping sequences for example:
envFrom:
- secretRef:
name: my-secret-with-vars
env:
- name: MY_VARIABLE
value: some-value
Configure startup probe timeouts#
You can define startup probe timeouts
for the Ranger Admin as well as the user synchronization container. This allows
the applications to connect to the database backend and complete starting the
application. The default setting of 300 seconds is the result of
failureThreshold * periodSeconds
and the default values of 30 seconds and 10
seconds.
admin:
probes:
startupProbe:
failureThreshold: 30
periodSeconds: 10
usersync:
startupProbe:
failureThreshold: 30
periodSeconds: 10
The default is sufficient for a database backend running locally within the cluster due to the low latency between the services. If the database is operating externally you can change these timeout values to adjust for the reduced network latency, and the resulting slower startup, to avoid startup failures. For example, you can raise the value to ten minutes for both containers:
admin:
probes:
startupProbe:
failureThreshold: 60
periodSeconds: 10
usersync:
startupProbe:
failureThreshold: 60
periodSeconds: 10
Configure location privileges (optional)#
You can set location privileges to ensure the correct users have
access to create objects in specific object storage locations. Location
privileges support CREATE TABLE
and CREATE SCHEMA
operations, as well as
CALL system.register_partion
for Hive catalogs.
To configure location privileges in a Kubernetes deployment, define a new
etcFiles.properties.location-access-control.properties
section of the
top-level coordinator
node in the values.yaml
file:
coordinator:
etcFiles:
properties:
location-access-control.properties: |
location-access-control.name=ranger
ranger.policy-rest-url=some_url
ranger.service-name=service_name
ranger.username=name
ranger.password=pass
In Ranger, you must create the appropriate policies as locations are denied by
default. Location privileges support recursive or non-recursive policies. For
example, if you have a recursive policy with the location /tmp/allow
then
/tmp/allow/nested
is valid.
Additionally, policies can contain wildcards, such as /tmp/*/my_table
.
Next steps#
Review the following topics for next configuration steps:
Read about Starburst Enterprise and Ranger
Complete your Ranger configuration