Object storage access control with Apache Ranger#

You can use the Apache Ranger integration with SEP to control access to Hive, Delta Lake, and Iceberg data sources configured in any catalog using the SEP Hive, Delta Lake, or Iceberg connectors.

This is specifically useful in the following scenarios:

  • You already use Apache Ranger to control access for these data sources

  • You use SQL standard-based authorization

If you want to control access for other catalogs and connectors with Ranger, you need to adopt global access control.

Note

Hive, Delta Lake, and Iceberg access control with Apache Ranger requires a valid Starburst license.

Installation#

Verify you fulfill the Apache Ranger requirements.

If you do not yet have Ranger running and configured, you can take advantage of the supported installation methods for global access control with Apache Ranger.

Configuration#

To use Ranger with Hive, Delta Lake, and Iceberg catalogs, you must set the appropriate property for your setup in the catalog configuration file:

For Hive catalogs:

hive.security=allow-all

For Delta Lake catalogs:

delta.security=allow-all

For Iceberg catalogs:

iceberg.security=allow-all

Additionally, you must have the access-control.properties file as well as the hive.security file for Hive catalogs. The access-control.properties and hive.security files are configured to use Apache Ranger for authorization and Kerberos for authentication.

An example access-control.properties file:

access-control.name=hive-based-ranger

ranger.hive-based.catalogs=hive
ranger.hive-based.fallback-access-control=ALLOW_ALL

ranger.policy-rest-url=https://ranger-host:6182
ranger.service-name=hive

ranger.authentication-type=KERBEROS
ranger.kerberos-principal=sep-server/sep-server-node@EXAMPLE.COM
ranger.kerberos-keytab=/etc/sep/conf/sep-server.keytab
ranger.plugin-policy-ssl-config-file=/etc/hive/conf/ranger-policymgr-ssl.xml

An example hive.properties file:

connector.name=hive
hive.metastore.uri=thrift://hive-metastore-node:9083

hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore-node@EXAMPLE.COM
hive.metastore.client.principal=hive/sep-server-node@EXAMPLE.COM
hive.metastore.client.keytab=/etc/hive/conf/hive.keytab

hive.hdfs.authentication.type=KERBEROS
hive.hdfs.impersonation.enabled=false
hive.hdfs.trino.principal=hdfs/sep-server-node@EXAMPLE.COM
hive.hdfs.trino.keytab=/etc/hadoop/conf/hdfs.keytab

hive.security=allow-all
System access control configuration properties#

Property name

Description

access-control.name

Set to hive-based-ranger to use Apache Ranger with your catalogs.

ranger.hive-based.catalogs

Specifies the comma-separated names of catalogs to be secured by Hive Ranger policies.

ranger.hive-based.fallback-access-control

Fallback access control to control resources that are out of scope of Hive Ranger policies. There are three options:

  • DENY-ALL: all resources that are not related to controlled catalogs are denied.

  • ALLOW_ALL: all resources that are not related to controlled catalogs are allowed.

  • PRESTO_DEFAULT: Similar to ALLOW_ALL, however it also forbids user impersonating and writing system information such as marking nodes offline, or changing runtime flags.

ranger.hive-based.query-access-control.enabled

Determines if access to query execution and killing should be controlled. Default is true.

ranger.hive-based.udf-access-control.enabled

Determines if access to user defined functions should be controlled. Default is false.

ranger.hive-based.database-name-provider

Controls how the Ranger database name is configured for Ranger resources. The following options are supported:

  • SCHEMA: Uses the schema name as the Ranger database name. This is the default option.

  • CATALOG_SCHEMA: Concatenates the catalog name and schema name with a configurable separator, such as _. This enables namespace-aware Ranger policies by using database names like catalog_schema instead of just schema.

See Database name provider for more details.

ranger.hive-based.database-name-provider.catalog-schema.separator

Configures the separator used when ranger.hive-based.database-name-provider is set to CATALOG_SCHEMA. Default is _.

materialized-views.run-as-invoker-default-value

Specifies default value for the run-as-invoker property.

See additional configuration properties available in Ranger.

Database name provider#

Catalog and schema names are mapped to Ranger database names when evaluating policies. The database name provider lets you customize this mapping. This is helpful for scenarios when pre-existing Hive Ranger database naming had unusual mapping, such as mapping SEP’s catalog.schema.table to Ranger’s catalog_schema.table.

The following two options are supported:

  • SCHEMA: Uses the schema name as the Ranger database name. This is the default option.

  • CATALOG_SCHEMA: Concatenates the catalog name and schema name with a configurable separator, such as _. This enables namespace-aware Ranger policies by using database names like catalog_schema instead of just schema.

For example, with the CATALOG_SCHEMA option, a fully qualified name of catalog.schema.table maps to a Ranger database name of catalog_schema and table name of table.

The following is an example access-control.properties configuration that uses the CATALOG_SCHEMA database name provider option:

ranger.hive-based.catalogs=catalog_schema_database
ranger.hive-based.fallback-access-control=allow-all
ranger.hive-based.database-name-provider=CATALOG_SCHEMA

Warning

Clusters that need unique ranger.hive-based.database-name-provider values for different ranger.hive-based.catalogs require separate access control configurations for those catalogs.

Resource and privilege relationship limitations#

The integration of Ranger with the Hive, Delta Lake, and Iceberg connectors does not support any kind of relation between a resource, such as a table, and the granted access. This is particularly evident when newly created resources are treated with the access rights of the existing policies.

For example, as a user without full administrative access (ALL PRIVILEGES) you can end up in a situation where you create a new table, but can not grant access to that table to other users. As a workaround you can create global policies that use matching patterns and define the desired default access.

If you want all tables ending with -reporting to be accessible by everyone, you can create a policy that uses the value *-reporting for the table. If you use * for database and * for column, you can grant a specific privilege such as reporting a table and all of its columns in any database.

Alternatively, you can grant all privileges to anyone creating new objects and require privileges to grant access to other users. This policy might be too open for your use case, depending on your security policies, the data, the number of users, and other aspects.

Role management limitations#

SQL supports the SET ROLE [role] statement to enable or disable a role. This feature is not supported in Ranger. All user roles are always enabled. Querying information_schema.applicable_roles and information_schema.enabled_roles yields the same results.