Apache Ranger overview#

Apache Ranger is a tool to manage access control policies for Hadoop/Hive and related object storage systems such as Delta Lake. It provides a simple and intuitive web-based console for creating and managing policies controlling access to the data.

The Privacera Platform, powered by Apache Ranger is an extended commercial distribution of Apache Ranger, that can also be used.

Starburst Enterprise platform (SEP) can be integrated with Ranger as an access control system. When a query is submitted to SEP, SEP parses and analyzes the query to understand the privileges required by the user to access objects such as schemas and tables. Once a list of these objects is created, SEP communicates with the Ranger service to determine if the request is valid. If the request is valid, the query continues to execute. If the request is invalid, because the user does not have the necessary privileges to query an object, an error is returned. Ranger policies are cached in SEP to improve performance.

Authentication is handled outside of Ranger, for example using LDAP, and Ranger uses the authenticated user and user groups to associate with the policy definition.

Note

SEP integration with Ranger requires a valid Starburst Enterprise license.

Requirements#

Before you configure SEP for any integration with Apache Ranger or Privacera Platform, verify the following prerequisites:

  • The SEP coordinator and workers have the appropriate network access to communicate with the Ranger service. Typically this is port 6080 or 6182, if SSL is used.

  • Apache Ranger 2.0.0 or higher must be used

  • Privacera Platform 3.6.0.63 should be used

Ranger usage options#

SEP offers the following different integrations with Ranger:

We highly recommend implementing Ranger for global access control. This allows you to use Ranger policies for all configured catalogs.

Note

When used for global access control, the Starburst Ranger integration extends the basic functionality of Ranger with the Starburst Ranger plugin. It allows Ranger to provide access control for all data sources defined by a catalog in Starburst Enterprise, and all other data sources supported by SEP.

Key concepts#

The concepts and features described in the following section apply to all Ranger usage.

Policies#

A policy is a combination of set of resources and the associated privileges. Ranger provides a user interface, or optionally a REST API, to create and manage these access control policies.

Resource sets#

A resource set includes one or more resources of different resource types. Wildcard characters are supported to select a number of resources based on a pattern.

  • catalog

  • catalog - schema

  • catalog - schema - table

  • catalog - schema - table - column

  • catalog - schema - procedure

  • catalog - session property

  • function

  • system session property

  • query

  • user

As you can see from the list above, some resources are hierarchically organized within a catalog and below. This allows you for example to restrict access to a complete catalog, a specific schema, or table or even down to a column or a procedure within a schema.

For example, if you can define a set of resources, that allows you to restrict access to all the two tables credit-info and cards-info in all schemas in the hdfs catalog.

  • Catalog: hdfs

  • Schema: *

  • Table: credit-info, cards-info

A set of resource works as a primary key for a policy. It needs to be unique. Multiple policies however may cover a single resource because of the wildcard.

It is best to create fine grained resource sets, especially when using column masking and row filtering. Using policies with wildcards can create hard to understand, or even unpredictable behavior, when there are multiple policies that apply to the same resource. For example, both *-schema-table-column and catalog-*-table-column apply to column in table in catalog. The second definition is more specific and therefore preferred to keep your configuration easier to understand.

Privilege sets#

A set of privileges consists of one or more user groups, roles and users, and a set of access types for the specified resource set. Privileges can allow or deny operations.

The catalog, schema, table and column resources, which grant access to resources for queries, have the following access types.

  • SELECT to read data from the resource

  • INSERT to add data to the resource

  • UPDATE to change data in the resource

  • DELETE to remove data from the resource

  • CREATE to create a resource

  • ALTER to alter a resource

  • DROP to remove a resource

  • OWNERSHIP to claim ownership of the resource, which provides complete access

  • IMPERSONATE to impersonate another user, and therefore use the privileges of that user

In addition there are privileges that determine access to queries and their usage, and are therefore of a more general nature.

  • SELECT to list queries.

  • EXECUTE to initiate processing of any query. Without this privilege user action is extremely limited.

  • KILL to stop processing of any query.

Users, groups, and roles#

Users, groups, and roles are sourced from your configured authentication system, ideally a connected LDAP directory, and are used the target users for each policy.

Column-level authorization#

SEP enforces column-level privileges granted to roles. For example, if a user is only granted access to a subset of table columns, they are only able to query from these columns. If they execute an SQL statement that refers to other columns, the query fails with an error.

Column masking#

SEP’s Apache Ranger integration supports most of the column masking methods that are supported in Hive with Ranger. SEP does not distinguish upper case, lower case and digital characters when masking. x is used for all mentioned character types.

Note

In the case of usage of any unsupported column masking, MASK_NULL is used.

Service and catalog integrations#

In addition to enforcing the policies in Apache Ranger, SEP integrates with the Apache Ranger Key Management Service, and has support for AWS Glue Data Catalog, row level filtering and tag-based policies.

Features and use cases#

The following features and use cases are applicable with all Ranger usage.

Hive and other catalog authorization set up#

The Ranger integrations replace any other authorization setup for the data source.

For example, you have to treat is as a replacement for authorization by the user configured for the connection to the data source, or any restrictions in the data source utilized by user impersonation or credential pass-through. It is important to avoid these other configurations, and let Ranger manage all access to keep the overall setup simple and manageable.

When catalogs use the Hive connector, disable the other Hive authorization checks in each catalog properties file. Edit the catalog properties file with the following configuration:

hive.security=allow-all

Controlling access to User Defined Functions with Ranger#

You can use the Ranger system access control to enforce User Defined Function (UDF) policies. A UDF in SEP is deployed as a plugin (Functions) and stored in the SEP global namespace. This global namespace is managed at the system access control level.

This is independent of the global and Hive access control with Ranger and the Privacera Platform.

The Ranger resource hierarchy for all UDF policies requires an associated database (or schema) namespace when creating the policy. Because the global namespace is independent of any connector namespace, this poses a slight challenge to control access to UDFs using Ranger. To overcome this you must specify $presto as the database name in Ranger. This keeps all SEP functions under the $presto database in Ranger resource hierarchy.

To configure Ranger system access control for UDFs, you need to add the following to a system access control property file e.g. named etc/access-control-ranger-udf.properties:

access-control.name=ranger-system-access-control
ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=presto-server/presto-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/presto/conf/presto-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

This additional configuration is needed because the Ranger system access control uses an independent Ranger client from the Hive access control. Only one Ranger system access control can be defined, while Hive access control can be configured separated for each Hive catalog. In the scenario where there are multiple Hive catalogs and multiple Ranger services, only one of those Ranger services can be used to manage the UDF policies.

Note

All Ranger configuration properties supported for global access control with Ranger are supported for Hive access control. Ranger properties related to row filtering or column masking are unsupported in global access control.

Audit#

When Ranger audit is implemented, whenever access is granted or denied through Ranger, an audit event is logged if auditing is enabled in a given resource policy.

Ranger audit is configured in the Ranger-specific file /etc/hive/conf/ranger-hive-audit.xml. Configuring Ranger audit is complex, and outside the scope of Starburst documentation; please refer to your Ranger documentation to learn how to set up audit optimally for your environment.

For Audit to work with SEP, the location of the file must be specified in your catalog properties file:

ranger.config-resources=/etc/hive/conf/ranger-hive-audit.xml

Caveat regarding performance

Ranger audits are performed by accessing the internal table system.runtime.queries. Any access to the table is logged.

The Web UI makes heavy use of the queries table. The property ranger.audit.system-runtime-queries.enabled is set to true by default and controls this logging behavior. Using the web interface causes a flood of audit events. Setting the property to false disables this audit logging.

Caching#

Caching is used to improve performance and reduce the number of requests to the Ranger service. Caching is enabled through configuration properties, which can be found in the Ranger installation and configuration page.

Authorization limitations#

Authorization information cannot be accessed by querying the following tables such as information_schema.roles, information_schema.applicable_roles, information_schema.enabled_roles, and information_schema.table_privileges.

Configuration properties#

The properties listed in this table apply to Ranger-related configurations in system access control properties files as well as catalog files using the Hive connector for Hive access control with Apache Ranger or the Privacera Platform.

Note

Some properties, such as ranger.row-filtering.enabled, are unsupported when Ranger is configured for global access control.

Ranger properties#

Property name

Description

ranger.policy-rest-url

URL address of the Ranger REST service, required to use HTTPS with Kerberos authenticationpolicy-rest-url.

ranger.service-name

SEP Ranger plugin service name.

ranger.authentication-type

Authentication type for SEP connecting to Ranger, BASIC (default) or KERBEROS.

ranger.username

SEP Ranger plugin user name. This property is used when ranger.authentication-type=BASIC is set.

ranger.password

SEP Ranger plugin user password. This property is used when ranger.authentication-type=BASIC is set.

ranger.kerberos-principal

Ranger service kerberos principal.

ranger.kerberos-keytab

Path to the Ranger service kerberos keytab file.

ranger.plugin-policy-ssl-config-file

Path to Ranger plugin SSL configuration.

ranger.policy-cache-dir

Path to ranger cache dir for policies. This allows loading policies from cache on startup, even though Ranger Policy Admin was not available at the moment.

ranger.policy-refresh-interval

Interval determining how often authorization polices are refreshed. The highest latency after which changes in Ranger authorization policies are visible in SEP. Default is 30s.

ranger.policy-connection-timeout

Ranger service connection timeout. Default is 120s.

ranger.policy-read-timeout

Ranger service read timeout. Default is 30s.

ranger.user-group-source

Source of user group information, RANGER (default) or STARBURST.

ranger.cache-ttl

Period for how long group mapping information is cached in SEP. 0ms disables the cache. If ranger.user-group-source is STARBURST, controls the period between user sync operations for a single user. Default is 30s.

ranger.cache-refresh-interval

Period for how long group mapping information is refreshed in SEP. Any value greater than ranger.cache-ttl disables it. Default is disabled, 0ms.

ranger.row-filtering.enabled

To enable row filtering, set this flag to true. This setting is not supported when Ranger is configured for global access control (where row filtering is always enabled), and causes cluster startup to fail if set. Note that there are semantic differences between the SEP and HiveQL SQL variants. Default is false.

ranger.wild-card-resource-matching-for-row-filtering

To enable resource wildcard matching for row filtering, set this flag to true. When two policies match a single resource, the one without wildcards is used. When multiple wildcard policies match, it is undetermined which one is used. This property is ignored when Ranger is configured for global access control. Default is false.

ranger.wild-card-resource-matching-for-column-masking

To enable resource wildcard matching for column masking, set this flag to true. When two policies match a single resource, the one without wildcards is used. When multiple wildcard policies match, it is undetermined which one is used. This property is ignored when Ranger is configured for global access control. Default is false.

ranger.config-resources

Additional XML configuration files which are read before applying your SEP Ranger configuration. Useful for reusing existing HIVE-LEVEL RANGER configuration with things like Ranger Audit configuration.

ranger.sql.enabled

Enable Ranger policy management with SQL as supported for Hive access control only. Default is true.

Ensure Ranger works with TLS#

If your organization implements TLS for network traffic between SEP and Ranger, you must ensure that both are correctly configured. You must add the SEP certificate to a JKS keystore and configure it in the Ranger SSL configuration file:

All catalogs accessing Ranger must define the ranger.plugin-policy-ssl-config-file property and point to the XML configuration file:

ranger.plugin-policy-ssl-config-file=/etc/starburst/ranger-policymgr-ssl.xml

If Ranger and SEP use globally trusted certificates, you can use the following Ranger SSL configuration file:

<configuration>
<!--  The following properties are used for 2-way SSL client server validation -->
  <property>
    <name>xasecure.policymgr.clientssl.keystore</name>
    <value>/etc/starburst/sb-admin-keystore.jks</value><!--coordinator's cert goes here-->
  </property>
  <property>
    <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
    <value>jceks://file/etc/starburst/sb-admin-keystore.jceks</value><!--coordinators jks file password store-->
  </property>
</configuration>

Without globally trusted certificates, you need to add Ranger’s certificate to a JKS truststore and link it in the XML file:

<configuration>
    <!--  The following properties are used for 2-way SSL client server validation -->
    <property>
        <name>xasecure.policymgr.clientssl.keystore</name>
        <!--This a certificate. Store the file with the coordinator certificate and private key. -->
        <value>/etc/hive/conf/ranger-plugin-keystore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore</name>
        <!--This a certificate. Store the file with the coordinator certificate and private key. -->
        <value>/etc/hive/conf/ranger-plugin-truststore.jks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.keystore.credential.file</name>
        <!-- This file holds the password from Starburst keystore -->
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
    <property>
        <name>xasecure.policymgr.clientssl.truststore.credential.file</name>
        <value>jceks://file/etc/ranger/hivedev/cred.jceks</value>
    </property>
</configuration>

More information about working with certificates is available in JKS files and PEM files. Avoid renaming JCEKS files after generating them, since that invalidates them.

When using the SEP Helm charts, you have to configure the XML file and the access control file inline of the YAML files.

Ensuring Ranger works with your authorization service#

You need to configure SEP to work with the authentication service used by Ranger. Ranger needs to have information about your users, groups, and roles in your authentication system. There are two ways of getting that information into Ranger:

  • SEP user sync - SEP pushes authenticated user data to Ranger directly. This is simpler to set up, as it only requires setting the ranger.user-group-source configuration property.

  • Ranger user sync - a separate ETL process from authentication service to Ranger. Requires separate setup of the user sync process.

While SEP does offer Kerberos support, SEP encourages the use of LDAP. The following configuration property is provided:

LDAP#

If your organization uses LDAP system for user and group information, Ranger can use that information to define role-based access to catalogs using any connector, as well as a number of other system resources. Policies in Ranger define access and authorization, and are created with the Ranger user interface. Users, groups, and roles are sourced from your connected LDAP directory and are used to target users for a Ranger policy. Each policy combines user and group information with a resource and access rights to the resource.

With the K8s and AWS installation methods, all details are already configured. For existing Ranger usage or manual installation, you must ensure that Ranger has data from your LDAP directory provider, and that a synchronization process (either SEP or Ranger user sync) is in place.

The process of connecting your existing Ranger installation depends on your particular LDAP implementation as well as your Ranger configuration. Learn more about that in the LDAP Authentication page.

Kerberos#

SEP can use Kerberos authentication page, and the Ranger integration also support Kerberos.

Warning

Most organizations that use Kerberos also use LDAP. We strongly encourage you to use LDAP instead of Kerberos, due to the relative unreliability of Kerberos servers, their lack of clear error messaging, and their rigid OS and JVM dependencies.

A sys admin Ranger user (user with role ROLE_SYS_ADMIN) must exist that matches SEP Kerberos principal ranger.kerberos-principal when or SEP Ranger plugin username ranger.username and password ranger.password, if BASIC auth is used.

The SEP Kerberos principal is translated to Ranger user name via auth-to-local hadoop rules from core-site.xml.

Note

Ranger version 2.1.0 removes the possibility to connect to Kerberized Ranger using basic user and password authentication. You have to add the following configuration to your Ranger core-site.xml file to restores this possibility by allowing unauthenticated access:

<property>
  <name>ranger.admin.allow.unauthenticated.access</name>
  <value>true</value>
</property>

Alternatively, you can configure SEP to authenticate to Ranger using Kerberos.

Starburst Ranger CLI#

You can use the Starburst Ranger CLI to manage integration of SEP with Apache Ranger or the Privacera Platform for the following tasks:

The command line application is an executable Java archive, that requires Java 11 or higher available on the system path. You can download it from Starburst and install it with the following steps on Linux or macOS.

  • Ensure the computer is able to reach the Ranger server via HTTP, since the CLI interacts with the REST API. This can be the coordinator, or worker in the cluster or any other computer.

  • Verify Java with java -version

  • Move the binary to a directory in your path, such as ~/bin and rename it.

    mv starburst-ranger-cli-*-executable.jar ~/bin/starburst-ranger-cli
    
  • Verify the folder is on the path.

    echo $PATH
    
  • If necessary, add the folder.

    export PATH=~/bin:$PATH
    
  • Now you can run the help command to verify the CLI works.

    starburst-ranger-cli help
    
  • The resulting output is similar to the following:

    Starburst Ranger command line interface
    USAGE:
    starburst-ranger-cli [--properties=<configFile>] [-p=<String=String>]... [COMMAND]
    ...
    

The help command can also provide details about the other commands and their specific options, if you append help to the desired command, with a few examples shown in the following block:

starburst-ranger-cli help
starburst-ranger-cli user help
starburst-ranger-cli service-definition help
starburst-ranger-cli group create help
starburst-ranger-cli user create help

Windows installation is supported as well and requires similar commands. You can also run the application directly with Java on Linux, macOS or Windows.

java -jar starburst-ranger-cli-*-executable.jar

You have to supply the connection details from SEP to Ranger in a properties file. Typically you can simply use the Ranger access control properties file by copying it to the computer running the CLI. Alternatively you can use individual properties as command line options.

  • Use the --properties to specify the full path to a .properties file that contains one or more key=value pairs on each line

  • Use the -p option for each property separately with the format -p=key=value.

In the following examples these properties are usually omitted, but they are necessary to find the Ranger endpoint.

Ranger user group management#

You can manage user groups in Ranger with the CLI. Properties are used to provide the details for Ranger access.

The following operations are available:

  • create a group

  • get a list of all groups

  • get a list of all groups a certain user belongs to

  • delete a group

It uses uses access control properties and positional parameters to pass group names using the following syntax:

starburst-ranger-cli group get [username]
starburst-ranger-cli group create group1 [group2] ...
starburst-ranger-cli group delete group1 [group2] ...

The following complete examples gets all groups in Ranger specified by the properties file and displays them:

starburst-ranger-cli group get --properties=ranger-access-control.properties

If a username is specified, only the groups of the user are displayed:

starburst-ranger-cli group get –properties=ranger-access-control.properties myusername

You can create one or multiple groups, and the identifier of each created group is displayed as confirmation:

starburst-ranger-cli group create group1 [group2] ...

Deleting groups is similar:

starburst-ranger-cli group delete group1 [group2] ...

Ranger user management#

You can manage users in Ranger with the CLI. Properties are used to provide the details for Ranger access.

The following operations are available:

  • create a user

  • get user details

  • delete a user

It uses uses access control properties and a mixture of positional parameters and options to pass user information using the following syntax:

starburst-ranger-cli user get
starburst-ranger-cli user create
starburst-ranger-cli user delete

A full example to get a user can look like this:

starburst-ranger-cli user get --properties=ranger-access-control.properties username

Creating a user can be done in two ways:

  • a basic user created from a default template:

starburst-ranger-cli user create user1 [user2] ...
  • using a JSON file, such as alice.json, with the following syntax:

{
  "name": "alice",
  "firstName": "Alice",
  "lastName": "Wonderland",
  "emailAddress": "alice@example.com",
  "password": "not@trivialP225w0rd",
  "description": "She went down the rabbit hole.",
  "groups": ["admin", "finance"],
  "roles": ["user", "account_owner"]
}

The file is passed with the -f or --from-file option:

starburst-ranger-cli user create -f=alice.json

If any group from groups doesn’t exist, it is automatically created.

It’s also possible to create multiple users using a file with a list of user definitions:

{
  "users": [{
    "name": "alice",
    "firstName": "Alice",
    "lastName": "Wonderland",
    "emailAddress": "alice@example.com",
    "password": "not@trivialP225w0rd",
    "description": "She went down the rabbit hole.",
    "groups": ["admin", "finance"],
    "roles": ["user", "account_owner"]
  }, {
    "name": "bob",
    "firstName": "Bob's firstName",
    "lastName": "Bob's lastName",
    "emailAddress": "bob@bobiverse.com",
    "password": "ForW3AreM@ny"
  }]
}

Service definition management#

You can find information about creating and overriding the service definition in the sections about installing and upgrading the SEP Ranger plugin.

Ranger REST API#

Apache Ranger includes a REST API that can be used for automating and troubleshooting your configuration and setup. Use it with caution and reference the API documentation as needed.